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Abstract 

Let Xi , . . . , Xn be a sequence of n classical random variables and consider a sample 
Xsi , ■ ■ ■ , Xs^ of r < n positions selected at random. Then, except with (exponentially in 
r) small probability, the min-entropy -f/min(^si ■ ■ ■ Xs^) of the sample is not smaller than, 
roughly, a fraction ^ of the overall entropy Hmin{Xi ■ ■ ■ Xn), which is optimal. 

Here, we show that this statement, originally proved in [S. Vadhan, LNCS 2729, Springer, 
2003] for the purely classical case, is still true if the min-entropy Hmin is measured relative to 
a quantum system. Because min-entropy quantifies the amount of randomness that can be ex- 
tracted from a given random variable, our result can be used to prove the soundness of locally 
computable extractors in a context where side information might be quantum-mechanical. 
In particular, it implies that key agreement in the bounded-storage model — using a stan- 
dard s ample- and-hash protocol — is fully secure against quantum adversaries, thus solving a 
long-standing open problem. 
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1 Introduction 

Let X be a classical random variable and let E he a, (generally quantum-mechanical) system whose 
state might be correlated to X. The min-entropy of X given E, denoted H^in{X\E), is a natural 
measure for the uncertainty on the value of X given access to the side information E. More 
precisely, Hmin{X\E) corresponds to the maximum length of a bitstring R which is (a) uniquely 
determined by X and (b) virtually uniform and independent of £'0 

Here, we study the following question initiated by Nisan and Zuckerman [NZ96) FI Given a 
sequence Xi , . . . , Xn of n classical random variables with min-entropy (relative to side information 
E) at least iJ„iin(^i • • • Xn\E) > nv, for some v what is the min-entropy iJrnin(^si • • • Xs^\E) 
of a randomly selected sample Xg^ , • ■ ■ , of r positions? In other words, we are starting with a 
sequence Xi, . . . ,X„ which contains at least nv bits of uniform (relative to E) randomness, and 
we are interested in the amount of uniform (again relative to E) randomness of the subsequence 
Xsi , . . . , Xs^.. 

As a main result, we show that the min-entropy per position is preserved under sampling, i.e., 

-H,^in{Xi---Xr,\E)>iy implies -H,^in{Xs, ■ ■ ■ X^JE) > ly + o{l) 
n r 

(except with probability exponentially small in r). This generalizes a result by Vadhan |Vad03j 
who considered the case where E is purely classical H 

A main application of this result is in the context of randomness extraction. It relies on the 
leftover-hash lemma |ILL89| (see also |BBCM95] ). or, more precisely, its quantum generaliza- 
tion |Ren05| (see also |KMR05[ IRK05j ) . saying that the randomness of a classical random variable 
X , measured in terms of the min-entropy, can be extracted by applying a suitable hash function. 
That is, X can be mapped to a string Z of size (roughly) Hn^in{X\E) which is virtually uniform and 
independent of E. Our result now implies that, given a long sequence Xi, . . . , X„ with sufficient 
min-entropy, random bits can be obtained by the sample- and-hash technique, i.e., first sampling a 
subsequence Xs-^ , • • ■ , Xs^ and then applying a two-universal hash function. 

The sample-and-hash technique is of interest in cryptography, in particular in the context 
of the bounded storage model [Mau92j . Here, the security of cryptographic schemes is based on 
the assumption that a string of random variables Xi, . . . , A„, called randomizer, is temporarily 
available for public access, but too long to be stored on a computer, even by a potential adversary. 
The idea then is to use this string as a source of secret randomness. 

^See Lemma 15. II of Section l5.2l for a mathematically precise statement. 
^Nisan and Zuckerman considered the special case where E is classical. 

^If the system E is purely classical, it can generally be omitted in the analysis, as explained in Section 12.51 
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Based on the original work by Maurer [Mau92j . various schemes for key expansion in the 
bounded storage model have been proposed [DM021 IDM041 ILu021 IVad03| . These are mostly based 
on the sample-and-hash technique described above. More precisely, a short initial string is used 
for selecting positions of the randomizer Xi, . . . , X„. Then a hash function is applied to extract a 
key Z. 

Because the min-entropy of the randomizer Xi , . . . , Xn given the information E stored by an 
adversary, H^i^^Xi ■ ■ ■ Xn\E), is necessarily large, our result implies that the final key Z is indeed 
uniform relative to E and, hence, secret. In other words, our result proves that key expansion 
in the bounded storage model is possible in the context of a quantum adversary. It generalizes 
previous results |DM041 ILu02[ IVad03| where security has been proved under the assumption that 
the adversary is purely classical. 

Outline 

The paper is organized as follows: We first cover some background material on randomness extrac- 
tion in Section [21 In Section [31 we discuss our main result and its relation to prior work. Section [H 
provides an informal overview of the central ideas involved in the proof. The remainder of the 
paper is devoted to a formal derivation of our main results; in Section [5l we establish the required 
properties of min-entropy. We subsequently apply these to the problem at hand in Section [HI 
where we derive our main result. We conclude in Section [3 by giving explicit parameters for key 
expansion in the bounded storage model. 

2 Basic definitions and known results 

2.1 Randomness extractors 

Randomness extraction, i.e., the process of transforming partially random data X into a uniformly 
distributed string Z, plays an important role in computer science and, in particular, cryptography. 
For example, it is used to generate secure keys, given only partially secret raw data. One of the 
most fundamental results in the area of randomness extraction is the leftover-hash lemma |ILL89| . 
It states that the number of uniform bits that can be extracted from a given random variable X 
by two-universal hashing (i.e., by applying a function chosen at random from a two- universal set 
of hash functions) is roughly equal to the min-entrop^ of X defined by 

Hnun{X) -logmaxPx(a;) . (1) 

X 

We can express this result more formally by saying that two-universal hashing is an extractor. 
A {k, £)-cxtractor is a function Ext : X x y ^ Z with the property that the random variable 
Z = Ext(X, Y) is e-close to unifornjl, i.e., 

l\\PE.t(x,Y) - PuJ < e , 

whenever X is a random variable X with min-entropy at least -ffmin(-'^) > k and Y is an inde- 
pendent and uniform seed, i.e., Py = Puy (Here Pu^ denotes the uniform distribution on Z.) A 
strengthening of this notion is the concept of a strong extractor, whose output is required to be 
uniform even conditioned on the seed Y. A strong (fc, £)-extractor satisfies the inequality 

\\PE.t{X.Y)Y ' PU, ■ PuA < £ (2) 

for all PxY = Px ■ Puy with i7niin(^) > fc- Two- universal hashing corresponds to a strong 
{k, e)-extractor with £ bits of output, for any e > and k> t-\-2 log 1 /e. 

^In the literature, the quantity -ffmin is also denoted Hoo and called Renyi entropy of order oo. 
^The Li-norm of a function f : Z —> M. is defined as ||/|| := X]zgz 1/(^)1- 
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While two-universal hashing is optimal in the number Hq(Z) log |-Z| of bits it can extract, 
it is not usable in certain applications. For example, computing the output Z = Ext(X, Y) might 
be infeasible, e.g., if the initial number Ho{X) = n oi bits is too large to be processed by a limited 
computational device. Also, in cryptographic scenarios, the seed Y is sometimes a (secret) key of 
limited size (e.g., Ho(Y) = O(logn)) compared to the length of X. Thus it is natural to try to 
find extractors with additional properties, such as efficient computability or limited seed length. 
An example of such a requirement which is important for applications in the bounded storage 
model is local computability; in other words, if X = (ATi, . . . , X„) consists of a large number n of 
blocks (or bits), the output Ext{X, Y) should only depend on a small subset X^ — {Xs-^, ■ ■ ■ , Xs^) 
of these values, where S — {si, . . . , Sr} = S{y) C [n] = {1, . . . , n} specifies the subset for every 
y E y. In other words, these extractors are of the form Ext{X, Y) = f{Xs(Y), Y)- 

2.2 Randomness condensers 

With the aim of finding other constructions of extractors, it is natural to consider weaker notions 
of randomness generation. One natural way to generalise the concept of a randomness extractor 
is to require that the output is only close to a random variable with high min-entropy (instead of 
being close to a uniform random variable). This leads to the definition of a {k, fc', e)-condenser: 
This is a function Cond : X x y ^ Z such that for all random variables X with Hnun{X) > k, 
there is a random variable Z with H^in{Z) > k' such that 

^II^^Cond(X,y) - -Pzll < £ , 

where K is a uniform and independent seed on y. In terms of the so-called smooth min-cntrop^ 
H^^J^Z) :— supp^.||p^_p^||<j i/niin(^) this requirement is simply expressed by 

Hl,^[Cond{X,Y))>k' . 

The notion of a condenser is a strict generalisation of the notion of an extractor. Indeed, a 
{k, £)-extractor Ext : X x y ^ Z is a {k, log |Z|, e)-condenser and vice versa. 

Again, a stronger version of condensers is obtained by requiring that Cond (AT, y) has high 
smooth entropy with high probability over y. The analog of ([2]) defining a strong (fc, k' , £)-condenser 
then is the requirement that for every X with 77min(A) > k, there exists a joint distribution Pzy 
such that 

]^\\Pcond{X,Y)Y - PzyW < £ > 

where Y is independent of X with uniform distribution Py = Puy on y, and Hjnin(Z\Y) > k' . 
Here, the conditional min-entropy is defined as 

Hn,iniZ\Y) log Y,PYiy)m^xPz\Y=yiz) ■ 

As before, this requirement is equivalent to demanding that 

iif„i„(Cond(A,r)|y) >fc' , 

where H^-^i^^(Z\Y) := sup||p^^_^_p^^^||<^ i?niin(^|^) is the conditional smooth min-entropy. With 
this definition, a function Ext : X x y ^ Z is a strong (fc, £)-extractor if and only if it is a strong 
(fc, log |Z|, e)-condenser. 

®The supremum ranges over all subnormalised probability distributions Pg, that is functions : Z — ► [0,1] 
satisfying T,^fzz ^zi^) < 1- 
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2.3 Constructing locally computable extractors: The sample-and-hash 
approach 

Condensers can be used as a building block for constructing extractors. A possible way of obtaining 
a new construction is by applying an extractor to the output of a condenser. More precisely, 
suppose that 

Cond -.Xc X yc is a {kc, kE,£c) — condenser, and 

Ext -.Xe X yE Ze is a {kE,£E) — extractor . 

It is easy to see that in this situation, the function 

Qf.XcX {yc X yE) Ze 

{xc, {yc, Ve)) ^ Ext(Cond(a;c, yc), yE) 

is a (kc, Ec + e£;)-extractor. This is because the condenser Cond generates a random variable with 
a sufBcient amount of min-entropy for Ext. This conclusion is also true for the strong versions of 
these notions: if Cond and Ext are a strong condenser and a strong extractor, respectively, then 
the function Ext is a strong extractor. 

Let us now return to the problem of constructing locally computable extractors. Clearly, if 
Cond(X,y) = Cond((Xi, . . . , X„), F) is of the form Cond(X,y) = Xg^y), where S{y) C [n] is 
a subset of indices for every y (z y, then the previous construction results in an extractor of 
the form Ext(X, F) — E.xt{{Xi, . . . ,Xn),{Yc,YE)) — Ext{X gi^Yc),YE)- This extractor is clearly 
locally computable. This way of building a locally-computable extractor by first sampling a few 
indices specified by S{Yc) at random and then applying an extractor is called the sample-and-hash 
approach. Building locally computable extractors is thus reduced to the problem of constructing 
condensers of the form Cond{X,Y) = Xg(^Y)- 



2.4 Averaging samplers are condensers: preservation of min-entropy 
rates 

Consider a sequence of random variable X = (^i, . . . , Xn) on A"" and assume that the min-entropy 
rate Rn-iin{X) :~ ^HqIx)'' lower bounded by /i, i.e., 

nlog\X\ 

We will call the quantity h'^{x) the Ihs the min-entropy rate of X. Suppose further that 
we select r of these random variables at random, resulting in a subset Xs = {Xg^, . . . , Xg^) 
corresponding to indices S — {si, . . . , Sr}- Intuitively, one would expect that with high probability 
over the choice of 5, the amount of randomness contained in such a sample is proportional to its 
size r — \S\, i.e., 

^i-5< R^r.{Xs) = ^l^^^.ni„(X5) (3) 

for some small 5 > 0. In other words, we expect the min-entropy rate to be preserved under 
sampling. Indeed, as shown by Vadhan [V ad03j (improving on previous work by Nisan and Zuck- 
erman [NZSGJ), inequality ^ is correct with high probability (over the choice of the sample 
S = {si, . . . , Sr})- In the terminology of condensers, this is saying that the function 

Cond : A-" X I '"^ 1 ^ X'' 



r 

{{X^,...,Xn),S)^ Xs 



is a {^n\og\X\,{ii — 5)r log jA"!, e)-condenser for some small (5, e > 0. We call this function the 
{^^^)-suhset condenser. 
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Neglecting issues related to computational complexity, a condenser of the form Cond{X,Y) = 
^s{Y) is fully specified by the distribution Pg = Ps(y) over subsets of [n]. The ('"')-subset 
condenser is simply represented by the uniform distribution over all subsets S C [n] of size \S\ — r. 

It is natural to ask which distributions over subsets S give rise to good condensers. Intuitively, 
a necessary condition is that the set of subsets S covers [n] well in some sense. In fact, Vad- 
han [Vad03| showed that it suffices for 5 to be a so-called averaging sampler; i.e., a distribution 
over subsets of [n] which can be used to approximate the average of any n values. Formally, such 
a sampler is defined as follows: 

Definition 2.1. An (n, ^, e)-sampler is a probability distribution Pg over subsets S C [n] with the 
property that 

<£/ora/U/?i,...,/3„)e [0,1]" . (4) 

For simplicity, we will assume that Pg is completely supported on subsets of the same size, and 
refer to this as \S\ < n. 

Observe that we only consider a one-sided error0 We will call ^ the accuracy of the sampler, 
and £ its failure probability. Returning to our example, the uniform distribution over subsets of a 
fixed size is an averaging sampler with the following parameters. 

Lemma 2.2. Let r < n and let Pg be the uniform distribution over subsets S C [n] of size \S\ — r. 
This defines a (n, ^, e~^^ ^'^)-sampler for every r > and ^ e [0, 1]. 

This statement is a consequence of the Hoeffding-Azuma inequality and given as Lemma 5.5 
of (BH05j . We call this sampler simply the (^^^^) -subset sampler. It will be sufficient for our 
purposes, but our results hold more generally for arbitrary averaging samplers. 

Vadhan showed that in the same way as the (^"')-subset sampler gives rise to the ('"')- 
condenser, any averaging sampler defines a corresponding condenser (with appropriate param- 
eters). In other words, a probability distribution P^ over subsets of [n] with the sampler prop- 
erty dH) preserves the min-entropy rate when picking a random subset, in the sense of ([3]). 

2.5 Extractors, condensers and prior classical information 

In cryptographic settings, it is often desirable to generate randomness which is not only (close to) 
uniform, but also independent of an adversary's prior information. We first consider the case where 
the adversary is classical, such that her information is described by a random variable E. In other 
words, the task is to generate a key Z satisfying ^WP^e — Puz • -Pgll < where E summarises the 
adversary's knowledge. 

Suppose the initial situation is described by a joint distribution Pxe, where X is held by 
the honest parties, and the adversary holds E. We will assume that the adversary's information 
about X is limited; this is expressed by a lower bound on the conditional entropy Hi^i-a{X\E) . 
Conveniently, a strong (fc, e)-extractor achieves key extraction in this setup, when invoked with 
(public) independent randomness Y . That is, we have 

Puz-Puy-PE\\<2e (5) 

for all PxE with H^in{X\E) > k + logl/e. In other words, if the adversary's initial prior infor- 
mation E about X is limited, the extracted key Z = Ext{X, Y) will look uniform to the adversary 

'^We point out that the notion of samplers is usually defined differently in the computer science literature. There, 
a sampler is an algorithm which efficiently approximates the average of a large number of values. The aim is to give 
an estimate of the average ^ f^i of an (arbitrary) vector . . . , /3„) £ [0, 1]" whose entries are accessible 

in the form of an oracle. Here we restrict our attention to so-called averaging samplers: These output the value 
EiGS Pi of ^ (randomly) chosen subset <S C [n] of values. For a more detailed discussion of samplers and their 
computational aspects, see |Gol97) . 
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even if he is given the seed of the extractor (i.e., E = {E,Y)). This procedure of using pubhc 
(independent) randomness to generate secret keys from partially secret information is well-known 
as privacy amplification |BBCM95] (usually in conjunction with two-universal hashing as an ex- 
tractor). 

Inequality (O is a trivial application of Markov's inequality; it is obtained by applying the 
extractor property to the conditional distributions Px\E=e- A similar conclusion holds more 
generally for any strong (fc, k' , e)-condenser: Here we have 

H^i^^{Cond{X,Y)\YE)>k' (6) 

for all joint distributions Pxe with Hynin{X\E) > k + logl/e. This means that the problem 
of randomness extraction in the context of prior classical information essentially reduces to the 
randomness generation problem without any side-information. 

2.6 Extractors, condensers and prior quantum information 

The mentioned property of extractors and condensers fails to be true in cases where the adversary's 
prior information E is quantum. Indeed, in this case, the conditional distributions Px\E=e are no 
longer defined, and the analysis of randomness extraction has to be done differently. 

The relevant concepts in this modified setup are sufficiently straightforward to define: Consider 
a classical random variable X and a quantum system E which is correlated to this variable. This 
situation is completely described by a classical-quantum state pxE = J2x(£X ^x{x)\x){x\ (g) p% 
(where is an orthonormal basis), or equivalently the ensemble {Px{x), p%}xex on E. 

For the purpose of randomness extraction, the relevant measure of min-entropy is the conditional 
min-entropy Haii^(X\E) introduced in [Ren05| : this quantity is defined bjj^ 

H^in{X\E) := - logminmin{A : pxE < A • idx ® cte} ■ 

The conditional min-entropy generalizes the classical min-entropy ([T]). For classical-quantum states 
Pxe, the min-entropy H^in{X\E) characterises the amount of uniform randomness Z = f{X) that 
can be extracted from X such that Z is independent of E. 

In terms of this measure of prior information, a {k,e)-strong quantum extractor is a function 
Ext : X xy ^ Z with the property that (cf. ©) 

]^\\PE.>a{X,Y)YE - PUz ® PUy® PbII < £ (7) 

for all classical-quantum-states pxE with }l^i^(X\E') > k. In this expression, Y is an indepen- 
dent and uniform seed on y, and pu^ denotes the completely mixed state on Z, i.e., the state 
lij X^zez Clearly, a (fc, e)-strong quantum extractor is a (A:, e)-strong extractor in the orig- 

inal (classical) sense. The converse is not true in general (see [GKK+OT] for a particularly striking 
example in the bounded storage model). However, the left-over hash lemma can be generalised 
to the quantum case: the two-universal hashing construction Ext : {0, 1}" x {0, 1}" {0, 1}^ 
is a (fc,£)-strong quantum extractor for any k > £ + 2 logl/e, as shown by Renner |Ren05| . 
(The optimality of this extractor with respect to the number of extracted bits is shown below in 
Lemma 15.11 ) As with classical extractors, an important goal is to find constructions which are 
more randomness-efficient, and satisfy additional properties such as local computability. 

Similarly, a, (k, k' , s)- strong quantum condenser Cond : Xxy Z is defined by the requirement 
(cf. ®) 

ii^i„(Cond(x,r)|ri?)>fc' 

for all Pxe with Hjnin{X\E) > k. In this expression, the smooth min-entropy H^^^{X\E) is 
defined by a maximisation over a set of operators in the vicinity of pxE- Note that there is 

*This definition is meaningful arbitrary bipartite states pxE even with non-classical part X. 
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a certain freedom in these definitions (the only constraint is the preservation of the desirable 
composability properties). We choose to define the smooth min-entropy as 

H^^.^{X\E) = sup H,^UX\E),,, , 

PXE '■ \\PXE—PXE II 

tr(pxB)<l 

where the maximisation is over all subnormalised nonnegative operators pxE in an e-ball around 
PxE, and the quantity on the rhs is the min-entropy of the corresponding operator (see below for a 
formal definition) . As shown in |Ren05| , if X is classical, this supremum is achieved by an operator 
PxE which is classical on X. To guarantee compatibility of quantum condensers and extractors, we 
require a (fc, e)-strong quantum extractor to satisfy ([7]) for all subnormalised nonnegative operators 
PxE with classical part X and iJniin(^|^') > k. This is true for two-universal hashing, as the 
analysis in [RenOSj shows. 



3 Our contribution 

3.1 Main result: samplers are quantum condensers 

Our main result states that samplers can be used to "condense" min-entropy even in a quantum 
context, in the same way as they give rise to randomness condensers for classical distributions (as 
discussed in Section [^T^ . More precisely, we consider an n-tuple X" = {Xi, . . . , Xn) of random 
variables on A"", where A" is a (large) alphabet. We show that relative to a quantum system E, 
the min-entropy rate is preserved when picking a random subset Xg (using a sampler). 
To express this in a concise form, we introduce the min-entropy rates 

RLin{A\B), , (8) 

where Ho{A) = log|y^| is the alphabet size of A. Our main result states that this quantity is 
approximately preserved under sampling. Clearly, when applied to a (n, ^, e)-sampler, such a 
statement must depend on the accuracy ^ of the sampler and its failure probability e. For such a 
sampler and the situation described above, our main result is given by the inequality 

R<^{Xs\SE)p > i?^i„(X"|^)p - 3^ - 2Klogl/«; , (9) 

where the parameters e' and k are equal to 

^,^2.2-«">°sl'^l+3£i/4 and ^^—I^. 

\S\\og\X\ 

(This result is stated as Corollarv l6.19l below.) This inequality shows that (for appropriate alphabet 
sizes) the min-entropy rate is preserved, up to the accuracy of the sampler. As expected, the failure 
probability e of the sampler is reflected in the distance (i.e., the smoothness parameter e'). In fact, 
this distance mainly depends on the failure probability of the sampler, and the term 2 • 2^^"'°^ I'^l 
is usually negligible. 

Observe that the expression 2k log 1/k on the Ihs of ^ goes to zero as k — > 0. The parameter 
K captures the alphabet sizes in the problem; our result applies to regions where n is small. As 
|iS| < n, this is equivalent to demanding that A" is a large alphabet. Thus we will henceforth 
assume that the random variables Xi are large "blocks" (instead of individual bits). 

It is instructive to apply this result to the ('"')-subset sampler: Here the error probability e 
decays exponentially with r for any fixed ^ € [0, 1]. More precisely, the following reformulation 
of ^ is obtained by setting A = 3^ -I- 2k log 1/k. We then have 

Rl;n{Xs\SE)p > R„,in{X''\E)p - A for any A > 2Klog 1/k , where 
Thus (smooth) min-entropy-rate is preserved up to a constant, with an exponentially small error e. 
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3.2 Related work 



We briefly explain how our contribution relates to other known results. We stress that giving a 
comprehensive review of all the relevant areas is not the aim of this section. Nor do we attempt 
to provide a complete list of references; the pointers given here are mainly intended to facilitate 
access to further literature. We identify the following broad points of contact with previous work: 

Quantum information about classical random variables: Random access encodings 

Our main result is an upper bound on the amount of information a quantum system gives about 
certain classical values. As such, it fits into a long line of work, the most prominent example of 
which is Holevo's upper bound on the accessible information [Hol73| . 

More specifically, our result bounds the information about a (randomly selected) substring 
Xs = {Xsi , . . . , Xs^) of a classical string X" — {Xi, . . . , X„). In this sense, it is structurally identi- 
cal to the random access encodings studied by Ambainis, Nayak, Ta-Shma and Vazirani |ANTSV99| . 
Formally, an m random access encoding maps n-bit strings X" = {Xi, . . . , Xn) into m- 

qubit states px in a way that allows to retrieve any (single) bit Xi with probability at least p 
by a measurement Strengthening the result of [ANTSV99] . Nayak |Nay99| showed that at least 
m > (1 — h{jp))n qubits are needed for this kind of encoding. (Here h{-) is the binary entropy func- 
tion.) This can be understood as a precise expression of the qualitative statement that m qubits 
cannot be used to store more than m classical bits. 

Recently, this result has been significantly generalized by Ben-Aroya, Regev and de Wolf [BARdOf] . 
They studied (^"') ^ m encodings, where the aim is to be able to retrieve each substring Xs of 
length r = \S\ with probability at least p from the m-qubit state. They showed that the success 
probability p decreases exponentially in r when m < 0.7n. The result [BARdOT] of Ben-Aroya et 
al. is of the same form as ours. Indeed, as explained below, in terms of entropies, it expresses the 
fact that in the studied situation, the entropy-rate is preserved. However, there are at least three 
major differences to our work. 

Firstly, [BARdOT] provides an upper bound on the guessing probability p{Xs\E), i.e., the 
probability of retrieving the correct value given quantum information E, which is the figure of 
merit in the context of random access encodings. By virtue of the identity ^(Ar^liJ) = 2~^"'"*^"^'5l^) 
(see |KSR07] for more details), their result implies a lower bound on the min-entropy (and, hence, 
also on the smooth min-entropy for any e > 0). In contrast, we derive a lower bound on the 
smooth min-entropy H^^^{Xs\E), which is the relevant quantity in the context of randomness 
extraction (e.g., in the bounded storage model). This, in turn, implies an upper bound on the 
guessing probability p[Xs\E) < 2~-^niin("^'5l^' + e. Because our result is not optimized for very 
small £, the upper bound on the guessing probability following from our result might be far below 
the bound of [BARdOT] . On the other hand, the bound on the smooth min-entropy implied by 
the result of jBARdOT] is below our bound, which is asymptotically optimal. 

A second, apparently insignificant yet important difference between [BARdOT] and our work is 
the alphabet size of the random variables Xi in the tuple AT" — {Xi, . . . , A„) G A"". While these 
are single bits in [BARdOT] . they may be random variables over a large alphabet in our work, i.e., 
every Xi G {0, l}"^ is itself a c-bit string for some (usually largc0) c. In the latter case, choosing 
a random subset S C [n] of size r — \S\ effectively generates a substring Xs = {Xg-^, ■ ■ ■ ,Xs,.) 
of length ^ = cr by blockwise sampling. For c ^ logn, this procedure consumes only log (") < 
r logn <C £ random bits, in contrast to log (^) > £ when the individual bits are chosen at random. 

®The notation used here is slightly different from these original papers. 

^''Note that our main result as stated in Corollarv l6.19l does not directly apply to cases where the alphabet of the 
random variables Xi is too small, e.g., if they are single bits. However, our result can be extended to these cases in the 
following way. Given, for instance, a bitstring B = (Si , . . . , Bjs/), the permuted string, 3,^ := (i3,r(l) i ■ • ■ i Sir(JV) )i 
for any permutation n £ Sjvi has the same min-entropy as B. We can therefore apply CoroUarv 16.191 to the 
permuted string B-^, for a randomly chosen 7r, and appropriately chosen partitioning B^r = (^1, • . . , X„) into n 
blocks, resulting in a substring B' = Xg with high min-entropy. Since, after undoing the permutation on B' , this 
string is identically distributed as a bitstring chosen at random from B, we conclude that the min-entropy rate is 
essentially conserved under random sampling of bits. 
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When applied to the bounded storage model, this means that we can extract more bits than the 
number of initial (shared) key bits. On the other hand, while the sample-and-hash approach can 
in principle be applied using the result of [BARd07] , the number of extracted bits is much smaller 
than the number of initial key bits, that is, no significant key expansion can be achieved. 

Thirdly, the result of |BARd07| measures the initial quantum information about the string X 
in terms of the number m of qubits used in the encoding. More precisely, it is assumed that X is 
uniformly distributed and that at most m qubits containing information about X are stored in a 
quantum system E (formally, Ho{E) < m, where Hq{E) denotes the logarithm of the dimension 
of E). In contrast, our result applies more generally to situations where merely a lower bound 
on the quantity Hia{^{X\E) is known, while the quantum system E may be arbitrarily large. 
The above special case where the dimension of E is bounded follows from the general fact that 
Hn,in{X\E) > H^UX) - HoiE). 

Key extraction: Extractors and privacy amplification 

The study of key extraction in the presence of a classical adversary is, as argued above, equivalent 
to the question of constructing randomness extractors (see |Sha02j for a survey of this intensely 
studied subject). More specifically, two-universal hashing was first applied to privacy amplifica- 
tion in |BBR88i[B"BCM95j . Maurer and Dziembowski |DM021 IDM04] obtained optimal protocols 
for key extraction in the (classical) bounded storage model. Lu [Lu02j made the connection to 
locally (or on-line) computable strong extractors. Vadhan subsequently gave essentially optimal 
constructions by showing that sampling preserves min-entropy |Vad03j : the sampling approach 
for extracting randomness can be traced back to the work of Nisan and Zuckerman |NZ96| and 
abounds in the randomness extractor literature. 

The situation in the presence of an adversary with prior quantum information is more intri- 
cate, and much less is known to date. On the negative side, Gavinsky, Kempe, Kerenidis, Raz 
and de Wolf [GKK+07 gave a surprising example of a classical extractor which fails to extract 
randomness in the presence of a quantum adversary (with a similar amount of quantum memory). 
On the positive side, Renner [RenOSj showed that two-universal hashing is optimal in the amount 
of extracted key (see also jRKOSj ). Konig and Terhal [KTOTj showed that strong extractors with 
binary output also extract secure bits against quantum adversaries; this provides quantum extrac- 
tors with short seeds, but does not achieve significant key expansion in the bounded storage model. 
Recently, new constructions of quantum extractors were proposed by Fehr and Schaffner |FS07| . 
While these extractors can be used for privacy amplification, their parameters are not suitable for 
the bounded storage model. 

4 Proof sketch 

In this section, we give an informal overview of the main ideas involved in the proof of the 
result . In Section 14. ![ we give a simple proof of an analogous statement for the (classical) 
Shannon entropy. Our proof for (quantum) min-entropy mimics this line of argument, but differs 
in a few major points, as discussed below. 

A few of our techniques may be of independent interest. A central idea is the splitting of a 
state into several components based on conditional operators; it leads to a modified chain-rule for 
min-entropies. We explain this in Section [4.21 The converse procedure which we call recomhining 
is especially interesting when only subsets of the split states are used in the recombination. The 
outcome of such a partial recombination is a state which approximates the original state. By 
selecting split states in a systematic fashion, we can single out the high-entropy components of a 
state. As we explain in Section 14.31 this is a fundamental tool for showing that a given state has 
a certain amount of (smooth) min-entropy. 

We will conclude this part of the paper with an overview of how these two procedures - the 
splitting and the recombining - can be combined with an argument about samplers to give the 
result we seek. 
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We stress that this section is introductory in nature, and the technical details are left to later 
sections. In particular, we will only argue qualitatively, and the formulas in Sections 14.21 and [43l 
are not meant to be taken literally. However, the basic structure of our arguments will be exactly 
as sketched here. 

4.1 Proof idea 

We show how to derive a modified statement related to (HI, where we restrict our attention to 
probability distributions and where the min-entropy H^in{A\B) is replaced by the (conditional) 
Shannon entropy H{A\B) ^ H{AB) - H{B). (Here H{X) = Px{x)\ogPx{.x) denotes 

the usual Shannon entropy.) This kind of proof is sketched in |NZ96j to give an intuition why 
samplers are good condensers. However, neither the proofs in |NZ96j nor Vadhan's proof |Vad03j 
proceed along these lines. 

The essential properties of the Shannon entropy used are the subadditivity property 

H{A\BC) < H{A\B) , (10) 

i.e., the fact that further conditioning can only reduce the entropy, and the chain-rule 

H{AB\C) = H{A\BC) + H{B\C) . (11) 

Consider a probability distribution Px^e, where X" — {Xi, . . . ,X„) is an n-tuplc of random 
variables. Our aim is to show that with high probability over a randomly chosen subset S C [n] 
of size \S\ = r, the entropy of H{Xs\E) is approximately equal to ^H{X^\E). 

To abbreviate the notation, we will define 

X^j = Xjj^iXjj^2 ■ ■ • Xn 

X<j =XiX2--- Xj for j e [n] 
^>„ = 

for any such n-tuple. The first step is what we call a splitting step: The chain-rule pip implies 
that the entropy H{X^'-\E) can be decomposed into its constituents, 

n 

H{X'^\E) = ^ a, where a, := H{X.i\X>,E) for any ie[n] . 

i=l 

In other words, we have split the entropy into a sum of individual components. 

If we now select a subset S <Z [n] of r = |iS| indices at random, then Chernoff's inequality 
implies that the inequality 

-Y,as>-H{X-\E)-0{l/V^), (12) 
r ^ — ' n 

ses 

holds except with probability exponentially small in r. Note that this holds more generally for 
any (n, ^, e)-samplcr S with corresponding adaptations. 
By strong subadditivity, we have 

a-i ^ H{Xj\X>jE) < H{Xj\X>jnsE) for any j e [n] , (13) 

where X>jn5 is the concatenation of all variables Xi with i > j and i €z S. With this inequality 
we can essentially eliminate all variables Xi with i ^ S from our inequalities. 

The final step is what we call a recombination step: Using the chain rule once again, we obtain 
with ^ 

H{Xs\E) = Y,HiX,\X>snsE) >Y,^^ ■ 
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In other words, we can get a lower bound on the joint entropy H{Xs\E) by combining the indi- 
vidual contributions s G 5. 

With (|12p , we conclude that with all but exponentially small probability, the (Shannon)-entropy 
rate is preserved when selecting a random subset. 

The proof of our main result for min-entropy follows the same lines, with a modified chain-rule 
for min-entropies. Notice that the chain-rule in the form pip can be seen as the combination of 
two inequalities, 

H{AB\C) < H{A\BC) + H{B\C) , (14) 
H{AB\C)>H{A\BC)+H{B\C) (15) 

both of which are used in the proof sketch. Indeed, the first inequality (I14p allows us to divide the 
joint entropy H{X"\E) into a sum of individual contributions, whereas the second inequality (jlSp 
provides a lower bound on the joint entropy H{Xs\E) in terms of its components. We refer to 
the first application as a splitting and the second application as a recombination step. For the 
min-entropy, these two steps are more involved; we do not only split and recombine entropies, but 
corresponding quantum states, as explained in the next section. 

4.2 Towards a modified chain-rule: Entropy-spHtting 

The subadditivity property (jlOp is easily shown to hold for the min-entropy. Similarly, a recom- 
bination-chain-rule (jlSp can be proved for min-entropy. However, the splitting-chain rule (jl4p is 
no longer true for min-entropies and has to be replaced by a more subtle statement. This can 
be seen as a quantum version of the entropy splitting lemma proposed in [ Wul07j . It is a major 
component of our proof and may be of independent interest. 

To state this modified splitting-chain- rule, consider a state pabc with purification \^ abcd) ■ 
We will construct a decomposition 

\^abcd)=Y.\'^abcd) (16) 

a 

of \^ abcd) hito mutually orthogonal subnormalised states {Y^'abcd)}^ such that 

i?„,in(A|BC)p» + iI,„i„(S|C)p. > H^,,,,{AB\C)p (17) 

for every a. In contrast to (I14p . this statement splits the entropy into a sum of individual entropies 
of states which are different from the original state \^ abc d) ■ They are, however, directly related 
to \^ abcd) by (fT6|l : we call these states split states. 

For technical reasons, it will be convenient to have a version of (fT7|) which decomposes \ abcd) 
into a fixed number m e N of states. The indices a are then from the set [m] :— {1, . . . ,77i}, 
and pT)) is replaced by 

H^UA\BC)p^ + H,,,UB\C)p^ > H^,^{AB\C)p - ^ for all a € [m] , (18) 

where A is function of \^abcd) which can be bounded in situations of interest. (The exact 
statement is given as Corollary 15.51 below.) An important property of the split states is that 
each Y^^bcd) result of applying a projection Q^jj 1° abcd), where Q^jj o^ly ^^ts 

non-trivially on systems A and D. 

4.3 (Partial) recombination of split states 

Decomposing a state \^ abc) into a sum of mutually orthogonal states {\^'ABc)}ae[m] gives us 
a convenient way of bounding the (smooth) min-entropy of \^ abc)- The general procedure is as 
follows: Suppose for example that our aim is to bound the quantity H[A\B)p from below. We will 
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show that if the entropy H{A\B)pa is large for every spHt state l^*"), then the same is true for the 
quantity H^in{A\B)p (up to a correction of size logm, see Lemma 15.81 for a precise statement). 

We can use this fact to show that a state \'^abc) is close to a state \'^abc) with large 
min-entropy H{A\B)p. We start from an arbitrary orthogonal decomposition of I^'abc) of the 
form (fro|) into m states {|^'")}Qe[m] • We then identify a subset r(A) C [m] with the property that 

H^,iniA\B)pc > A for aU a G r(A) . 

We define the partially recombined state 

Qer(A) 

We can show that H{A\B)p > A is large. Moreover, since the states {|^'^Bc)}Qe[m] a-re assumed to 
be orthogonal, we can bound the distance of \'^abc) to the original state \'^abc) by an expression 
of the form 2-^/1 — iu;(r(A)), where a;(r(A)) is the weight of r(A) under the probability distribution 
Lo{a) = i<'\'^ABc) i^ABcl [™]- this way, showing that the smooth min-entropy H'^{A\B)p of 
l^ABc) is lower bounded by a value A reduces to showing that the corresponding set r(A) has a 
large weight under uj. 

4.4 Putting it together: splitting, sampling and recombining 

Let us now return to our original problem: Given a quantum state px"E = PXf-XnE with 
purification I^E"), we would like to show that H^^^^{Xs\E)p is large with high probability over the 
choice of 5 C [n] . To illustrate the required steps in the proof, let us consider a simple example 
where n = 4. 

The first step is to apply the splitting rule to \^), dividing the joint entropy Hnm\{X'^\E) into 
a contribution from Xi and the remainder. This gives m states \^'^^) aie[m] with the property 
that for all € [m], 

Hrain{X'^\E)p < H^ia{Xi\X^lE)p'>l + H^i-a{X>l\E)pai . (19) 

(Here — | denotes the density operator corresponding to l^*"!).) We then apply the 

sphtting-chain-rule to each of these states in order to split i7min(X>i |i?)pci — i/inin(-'^2-''^3-'^4|-B)pci 
into the contribution of X2 and the remaining part. This results, for each ai G [m], in a collection 
of states {|^'"^"^)}Q26[ni] satisfying 

Hinin{X>l\E)po.i < fl"min(-^2|-'^>2£^)p°i°2 + ffmin(-'^>2|£^)p°i°2 . (20) 

Finally, dividing the last term into contributions from X3 and X4, we get, for each (ai, a2) G [w]^, 
a family of states {|*"'°^°^)}a3e[m] such that 

Hinin{Xy2\E)p'>'l°:2 < i?min (-''^S |-^>3-E')p°i "2=3 + i?min (-^^4 | -E^)p°i "2°3 . (21) 

This completes the splitting step. Summarising, we have obtained a collection of states starting 
from 1^): Those states {|^"0}aie[m] obtained by applying the splitting-chain-rulc once, the states 
{|5'"i"2)}Q,jQ,2g[m]2 corresponding to states that are the result of splitting twice and so on. 

A useful geometric visualisation (which is, however, not essential for the proof) is obtained by 
placing these states at the vertices of an m-ary tree (in this case of depth 3). We place \^) at 
the root, and the descendants of each vertex are the split states obtained by splitting. Thus every 
3-tuple {ai,a2,a-i) G [mf' specifies a path with vertex labels (l^*), |*"'"^), from 

the root to a leaf. 

Let us combine inequalities (|19 p - (PT|) into 

E[ia\n{X'^\E) p < Hi-ain{Xi\X^iE) pai + i/min (^2 l-'f >2£^)p"i °2 

+ i7mi„(X3|X>3£;)p=i<.2°3 + i?min(^4|£')p"i°2°3 for aU = (ai, q;2, aa) G [mf . 
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By attaching the entropies of interest to the edges of the mentioned tree, we can interpret this 
inequahty as expressing the fact that the sum of the values of the edges along each path of the 
tree from the root to a leaf is lower bounded by Hjnin{X'^\E)p. 

The next things to consider are the sampling- and recombination step. Our aim is to show 
that the smooth entropy H^-^^{Xs\E)p is large (with high probability over the choice of the subset 
S C [4]). We follow the procedure outlined in the previous section. That is, we define the 
recombined state 

where r(A,5) is the set of paths G [m]'^ with the property that 

+S3esHniin{X3\Xy3E)paia2'^3 + (54g5i/inin(-'^^4|£^)p°i°2°3 > A . 

In other words, we restrict our attention to paths (and corresponding states) which (when restricted 
to S), have large entropy. We then need to show the following: 

(i) with high probability over the choice S, the state \'^) is close to \'^) 

(ii) the entropy H{Xs\E)p is large. 

The proof of again involves a bound of the form 

I - |$)($||| < Vl-^(r(A,5)) , 

where uj{a^) ~ tr|^'"^)(^'"^|, e [m]^ is a (fixed) probability distribution on the leaves. We will 
show that a sampler has the following property, when applied to the situation described above 
(see Section 1^3)) : With high probability over the choice of S, the weight a;(r(A, S)) is large. More 
generally, we show how the sampler-property extends from a single sequence of values to the case 
of a matrix of values (in our case corresponding to edges of a tree). 

The proof of (jn]) is done inductively using subadditivity, the recombination-chain-rule, and the 
recombination argument outlined above. For concreteness, suppose for example that S — {2,4}. 
Then we have 

Hmin{X2\X>2E)p'^l'^2 + ffmin (-^^4 | £^)p°i "203 > A 

for all — (ai, 0:2, 0:3) G r(A,5) C [m]"^. It is convenient to rephrase this as follows, writing 
a"^ — (a^, as). We then have for all £ [m]^ 

7/„iin(X4|S)^(„2,„3, > A - i?mi„(X2|X>2^)p„2 for all as with (0^,03) € r(A,5) . 

In particular, when we apply this to the (intermediate) partially recombined states 

1$"') = ^ |^-("''"3)) ^ (22) 

a3:(a2,Q3)er(A,5) 

we obtain 

i7„,i„(X4|^)^„2 > A - i?„,i„(X2|X>2S)^.2 for all . 

We will also use the fact that the recombined states satisfy i/„iin(-'^2|-''^>2£')-<:,2 > Hynin{X2\X:^2E) ^2 
fsee Lemma l6.6[j vl)). Subadditivity gives i/min(-'^2|-'^4£^)^2 > Hjnin{X2\Xy2E)^2 for all a ^ G [to]. 
With the previous two inequalities, we therefore get 

H^i^iXi\E)^^2 + H^UX2\XiE)^^2 > A . 



14 



This in turn implies 

ifmin(^2^4|-B)pc2 > A for all a' £ [m]' 

by the recombination-chain-rule. Because \'^) can be written as sum of the states ([22]) . the 
recombination-procedure then gives 

H„un{X2X4\E)p > A , 

as claimed. 

This line of argument can be followed more generally for a general subset S C [n]. We will need 
intermediate (partially) recombined states { | Vl/"^ ) Iqj e [m] j i j G [n] ; these can again be thought of as 
being attached to the vertices of a tree. They are defined recursively, by recombining "good" states 
(i.e., those corresponding to prefixes of elements in r(A,5)). In other words, when recombining, 
we work our way up the tree (omitting "bad" states, i.e., those with small entropies.) 

This concludes our sketch proof; it is now time to elaborate on the details. 

5 Rules and tools for min- entropy 

In this section, we set the ground for our result concerning samplers. In particular, we formally 
introduce the conditional min-entropy Hi-niniA\B)p in Section [5.21 This will be done via an inter- 
mediate quantity H(A\B)p. . Most of our rules for min-entropy, the most basic of which are stated 
in Section 15.31 apply to these intermediate quantities; they will be our main object of study. In 
Section [5^ we establish our central splitting-chain-rule. 

5.1 Preliminaries 

Throughout, we consider nonnegative operators acting on finite-dimensional Hilbert spaces (or 
systems) Haj'Hbj ■ ■ ■ and their tensor products. We use subscripts to indicate which systems an 
operator acts on. We also use subscripts when we trace out systems, but sometimes make use 
of superscripts to denote "tracing out everything but" , in the following sense: for a tripartite 
state PABC, we write trBc{pABc) = ^'''a{pabc) = Pa for the reduced density operator on A. As 
explained above, we sometimes abuse notation by omitting identities. For example, we will write 
operator inequalities such as 

PAB < ctb , 

for a bipartite operator pab and an operator ctb on Tis . By this inequality, we simply mean pab < 
idyl 'S> Ob (which is defined by the condition that id^ ® gb ~ pab is a nonnegative operator). More 
generally, when writing operators on multipartite systems, we omit identities whenever a unique 
meaningful statement can be obtained by tensoring corresponding identities to the operators. To 
give an example, we will write expressions such as 

QbPabQb < PdPabcdPd , 
where the operators act on the spaces indicated by subscripts, instead of 

{\Aa ®Qb® idcD)(/5AB ® idc_D)(idA ®Qb® idcn) < (idAsc ® Pd)pabcd{^\<^abc ® Pd) ■ 

Basic properties of operator inequalities we need are their preservation under partial traces 
and the application of operators, i.e., the fact that pab < oab implies that 

PA < OA 

and 

TabPabT^b — PaboabT^^ 
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for any operator Tab on Ha '^Hb- 

For two operators pab on Ha ®Hb and ub on Hb such that the support of pB is contained 
in the support of ctb, the conditional operator is defined aJ^ 

PAB -1/2 -1/2 .„„^ 

■■=ag PAB^B ■ (23) 

Here cr^^^^ \f^^^ i where cr^^ is the generalised invers^^ of ctb. An important property of 
conditional operators is that 

^^trc(^) (24) 
(Jb ctb 

for any tripartite operator pabc- 

We will say that a bipartite operator pae on Ha ® He classical on A (relative to an 
orthonormal basis {\a)}a of Ha) if it has the form pae = J2a l^)(^U *^ Pe- Clearly, if pae is 
classical on A relative to {|a)}(j, then so is p'ae = OaePaeO]^^, for any operator of the form 
Oae — J2a ® ^E- ^'^ verify that this statement is still true when considering 

purifications and additional classical systems: If I^abbf) is such that the reduced density operator 
PAB E is classical on both A and B (relative to some orthonormal bases) then the same is truj^ 
for the state OaeI'^abef)- 



5.2 Definition of min-entropy 

As already mentioned, every pair of operators pab on Ha ® Hb and on Hb such that the 
support oi pb is contained in the support of ctb give rise to a conditional operator We define 

the quantity H{A\B)p. as minus the logarithnP^ of the maximal eigenvalue of this conditional 
operator, that is 

H{A\B)^ :=_logA„,ax( — ) • 

Ub 

In some sense, this can be read as "the entropy of pA when it is conditioned on ctb" ; in the case 
where A is classical, the operator ctb is related to a measurement on Hb (which is supposed to 
reproduce the value on A, see |KSR07] ). 

Maximising this quantity over all nonnegative trace-one operators as whose support contains 
the support of gives the min-entropy of A given B, defined aJ^ 

H^in{A\B)p := snpH{A\B)^ . (25) 

This quantity has a simple operational interpretation, as will be shown in a forthcoming publica- 
tion [KSR07| : it is equivalent to the maximal probability of guessing A given B, in the case where 
A is classical. 

'^^Note that for ub = Pb, definition l|23p coincides with the conditional operator Pa\b = ''p^ discussed, e.g., 
in |Lei07l . 

^■^The generalised inverse ct~^ of an operator cr is defined as the operator which has the same eigenspaces as a 
with zero eigenvalue on the null eigenspace of cr and eigenvalues on the eigenspace of cr corresponding to the 
eigenvalue A > 0. 

^^This can be seen by decomposing the state as \^ abef) = 5Za b Classicality of the state Pabe on 

A and B then implies that tr p{\ip'^p){ip'^p |) = whenever (a, b) (a', b'). The claim can then be deduced from 

the fact that tr^ (O" |</,^'^)(v^'/ |(0-')t) = 0%trp{\v''/p)(<pi'/ \){Oi)^ . 

^*In the following, we will always assume that the support of pg is contained in the support of erg , such that the 
conditional operator is well defined. 

^^AU logarithms log are binary; natural logarithms will be denoted by In. 

^^In Scction [2.6l the quantity H^-ni^{A\B)p was introduced without explicit reference to the intermediate quantities 
H{A\B)ji. 
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While (P5|l is ultimately the quantity of interest, the intermediate quantities H{A\B)r are 
easier to manipulate, and satisfy various useful rules. As we will see below, most of these follow 
more or less directly from the alternative characterisation 

H{A\B)j, := -logmin{A : ^ < A • idAs} (26) 

fT_B 

in terms of a family of operator inequalities. 

For consistency reasons, it is convenient to set 

H{A\B)p ■.= H{A\B)e 

H{%\B)p -logmin{A : — < A • id} , 

cr_B 

where ^ = a ^^"^ pso ^^"^ . Note that the latter quantity is equal to li{A\B)E_ if the Hilbert 
space I-La corresponding to system A is trivial, i.e., I-La — C. We can think of the quantity 
H{B)e. as a conditional entropy obtained by adjoining a trivial system to B using the isomorphism 
Jig ~ Tig (g) C. Informally, this corresponds to a situation where we condition "nothing" on ub] 
formally, it will turn out to be convenient to define H{%\B)r :— H{%\B)£.- 

Finally, we will also (formally) encounter situations where pab = 0; in these cases, we formally 
set H{A\B)p = oo, H{A\B)e_ = oo, meaning that an arbitrarily large value can be assigned to 
these quantities in any identity where they appear. 

For a parameter e > 0, the e-smooth min-entropy of A given B is equal to (cf. [RenOSj ) 

i?^i„(A|B), = sup H^i^iA\B)p , 

Pab'-\\pab~Pab\\<£ 
tr(PAB)<l 

where the supremum is over all nonnegative operators pab with trace bounded by 1 in an e-ball 
around pab- (Here ||j4|| = trV At A is the Li-norm.) 

As already mentioned, the (smooth) entropy H^^^{X\E) captures the number of secret bits 
extractable from X with respect to an adversary holding E. The following lemma justifies this 
operational interpretation. 

Lemma 5.1. Consider a state pxE where X is classical. Let pu^^ denote the completely mixed 
state on {0, 1}^. Then 

(i) For any £ < H^in{X\E) — 21ogl/e, there is a function f : X x X {0, 1}^ (independent of 
Pxe) which extracts an i-bit string Z = f{X,Y) from X, such that Z is e-close to uniform 
and independent of {E,Y), where Y is a uniform and independent seed. In formulae, we 
have 

1„ 

^\\Pf{x,Y)YE - Pu^„_,yt ® PY ® PeW < e . 



(a) For any function f : X {0, 1}^ and £ >Q, the inequality 

1,1 II 
l^\\PS{x)E- Pu^„ ,^, ®Pe\\ < £ 

implies 

HlL{X\E)p > £ . 

Proof. Statement (jT| is a reformulation of the fact that the two-universal hashing construction is 
a quantum extractor, as shown by Renner ^Ren05j. 

For the proof of jul, let psE ■= Pu^^ ^ Pe- Then, obviously 

H^,US\E)p>H{S\E)g^£. 
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Because ^Hpsb — P/(x)_e|| < £: this implies 

Since the min-entropy can only decrease when applying a function (see |Ren05) ), we conclude that 

as desired. □ 
5.3 Some basic rules and properties 

We now summarise a few basic rules for the quantities H(A\B)p_ which directly follow from 



using standard properties of operator inequalities, as described in Section 15.11 

Lemma 5.2 (Properties of min-entropy). The min-entropy satisfies the following. 

(i) (Positivity for classical systems) Let pab — Tlia\'^)('A ® P'b classical on A. Then 
H{A\B)p > 0. 

(ii) (Dimension bound) For any pab and ctb with o^b < Pb, we have H{A\B)r < Ho(A), 
where Ho{A) — logdimH^- In particular, H{A\B)p < Ho{A). More generally H{B\C)r > 
H{AB\C) p. — Ho(A) for any pabc and ac- 

(Hi) (Subadditivity) H{A\B)p. > H{A\BC) r for any pabc and aBC- 

(iv) (Recombination-chain-rule) H{AB\C)£. > H{A\BC)p + H{B\C)e. for any pabc and 



Proof. ^ directly follows from \a){a\ (E) p% < Pb for all a. 

For the proof of the first part of (|n]), we simply take the trace on both sides of the inequality 

^ r,-H(A\B)n , , , , ^ ^Ho(A)--H(A\B)n , , , . , . , . , 

PAB < 2 " (Tb to get tr{pAB) < 2 "tr(crB), which gives the claim because 

tr(cr_B) < tr(pB) — tr{pAB)- For the proof of the second part of (jHI, observe that we have 
Pabc 5: 2 ^ ac by definition. Tracing out the system A gives 

^ ^-H(AB\C)p+Ho(A) 

Pbc S ^ " fc • 

The claim ^ then follows from the definition of H{B\C)p.. 

Similarly, (jml directly follows by tracing out C from the inequality 

~H{A\BC)r 
PABC S ^ " <7BC ■ 

For the proof of ([iv]), observe that 

^ r,-H{A\BC),^ ^ r)-H{A\BC),-H(B\C)^ 
PABC < 2 ^1 "^PBC S 2 " Cfc ■ 

The claim follows from the definition of H{AB\C)r. □ 
We point out that (jHI and (jml directly translate into the statements 

H^UB\C)p > H^i,,{AB\C)p - Ho{A) 

H„,in{A\B)p > Hn,nM\BC)p 

for the min-entropy. An analogous statement cannot be made for the recombination-chain-rule (jivj , 
and we will have to retain the dependence on ac in our arguments. 

Having established subadditivity and a recombination-chain-rule, we will address the problem 
of finding a converse splitting-chain-rule in the next section. Before doing so, however, we will 
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mention another property of the min-entropy which wiU be important for our purposes. This is 
the fact the entropy of a state = Ql^*) obtained by applying a projection to a state l^*) is 
lower bounded by the entropy of the original state. We will later see that this allows us to retain 
information about the entropy when going from a state to its split descendants. 

Note that this statement is not generally true, but depends crucially on where the projection 
acts. 

Lemma 5.3 (Monotony under projections). Let \9 abc) be a pure state, let Qc be an operator 
on C and let I^^'abc) ~ Qc\^ abc) ■ Let pabc md p'abc be the corresponding density operators. 
Then 



H{A\BC)p, > H{A\BC)p . 
Furthermore, if Qc is a projector, then 

H{A\B)^ > H{A\B)p and H{$\B)^ > H{$\B)p 



for arbitrary a — <tc ■ 

Proof. To prove the first inequality, let Qc be arbitrary. Applying Qc from the left and Q^ from 
the right to both sides of the inequality 

PABC < 2-^(^l^^)^psC 

gives 

p'abc < 2"^(^I^'^)^p'bc 



by definition of I^^abc) ^^'^ properties of the partial trace. This proves the first inequality. 
Let now Qc be a projector, and let \(Pab) € AB be arbitrary. Then 

^K\vab){^ab\p'ab) = tr((|(^As)(</?As| ® iclc)pW) 
= tr {{\ipab){(Pab\ «) Qc)PABc) 

by the cyclicity of the trace and the fact that Qc is a projector. In particular, with Q-^ — \dc — Qc 
denoting the projector onto the orthogonal complement of the image of Qc, we have 

X.r{\^PAB){^AB\{pAB ~ p'ab)) = {{W Ab) Ab\ ® Qc)PABc) > . 

We conclude that < pab- In particular, 

' ^ ^ .,-H{A\B}j, ' ^ ^ r,-ff(0|S)iL 

Pab < Pab < 2 "as and Pb ^ Pb < ^ " ctb 

which implies the claim. □ 



5.4 Entropy-splitting: A splitting-chain-rule for min-entropy 

To introduce our splitting-chain-rule, we proceed in two steps: In Section [5331 we show a simpli- 
fied version which does not restrict the number of states the original state is split into. As this is 
irrelevant for the remainder of our proof, this section can be skipped; however, it nicely illustrates 
the relevant features. The case of interest, where we split a given state into a fixed number m of 
states, can be seen as a coarse-graining of the former. It will be the topic of Section fS. 4. 21 
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5.4.1 A warm-up 

The chain-rule we will prove in this section concerns a tripartite state pabc with purification 
PABCD — abcd){^ abcd\ and an operator ac- We will show that we can split abcd) into 
a sum of states {\^'ABCD)}a as in ([TC]) . in a way that 

H{A\BC)p^ + H{B\C)p^ > H{AB\C)p for aU a , (27) 

IT '-^ 

where Pabcd ^ I^ABcn) (^abcdI- Note that by taking the supremum over as, we immediately 
obtain the inequality 

H,^nM\BC)p. + H„,,n{B\C)p. > H„,in{AB\C)p for all a 

from ([27| . However, ([27| makes a stronger assertion, and we will generally deal with statements 
of this form. 

For the proof of fTT]) . consider the eigendecomposition 

PBC pa 

of the conditional operator, where P%(j is the projector onto the eigenspace corresponding to the 
eigenvalue a. 

We will use the operators P^^ to define our split states, which will be labeled by the spectrum 
of ^f-- Clearly, if we apply Pg^ on both sides of the operator we end up with an operator 
which has a single non-zero eigenvalue a. While p<^PS£-p<^ thus has a very simple form, it is very 
different in nature from the original "unconditional" operator pbc- Intuitively, it therefore makes 
sense to multiply by an . The appropriate definition of \^'abcd) turns out to be just the result of 
this, i.e., we can definqHl 

mBCo) = a]i^P^c^c"^\'^ABCD) ■ 

It is easy to check that these states decompose \^abcd) as in They also satisfy P?)) . as we 

will show now. First observe that p'^(~, = ctc^ ^bc^^^bc'^c^ their very definition, and thus 



In particular, we have 



which implies that 



(V 1/2 j~tct 1/2 

Pbc = "-^c Pbc'^c 



^ = aPSc . (28) 



fl"min(B|C)^ =-loga . (29) 
Combining ([28]) and ([29]) gives the statement 

PSc < 2 ' >^Ebc _ (30) 

By definition of the quantity H{AB\C)^, we also have p^'^^ < 2 ^. Applying the 

projector Pqc on both sides of this inequality leads to 

pa PABC pg ^ry-H(AB\C)^ H^,UB\C)^^H(AB\C)^p%c 
^BC ^BC ^ ^ " ^BC ^ ^ " ° • 

ac crc 



^"^ As above, we assume that the support of ac contains the support of pc hence, ac is invertible on the relevant 
subspace. 
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Multiplying this inequality from both sides by dp immediately gives the desired statement ([27|) . 

since p-^sc-^TpSc'^Pbc'^T-^ 

This concludes the proof of our simplified statement, where a state I'^Iabcd) is split into a 

family {\'^ABCD)}ai each of which obeys the splitting-chain-rule inequality (P71) . The number of 

states is determined by the number of different eigenvalues of the operator ! indeed, each state 

I^abcd) corresponds to an eigenvalue a. 

Before continuing, let us show the following useful properties of the states {I'^'abcd)}^- They 
are mutually orthogonal, and each state I'^'Xbcd) ~ QadI^) ^^'^ result of applying a projection 
Qad (which acts non-trivially only on A and D) to \'^abcd)- This statement is the result of 
using the complementarity property that is inherent in quantum states. 

First observe that I'^'abcd) is a purification of the conditional operator Using the 

Schmidt-decomposition, we can write 

(t'^^^^I^abcd) = V(^\a) AD\a) bc , 

a 

where {|a)yii5} and {|q!)_bc} a-re eigenvectors with eigenvalue a of and ^^7-, respectively 
(slightly abusing notation, we omit multiplicities). We can define Qad ^s the projector onto the 
eigenspace of corresponding to the eigenvalue a. We then clearly have 

PSc'^c'^'I^abcd) = Q1d^c'^>abcd) 

1/2 

for every a. Since crj and Q^/^ act on different systems, they commute, and we obtain 

\nBCD) - <j'fQlD'^c'^'\^ABCD) = Qlol'fABCD) , 

as claimed. The orthogonality of these states is now immediate. 
5.4.2 Splitting into a fixed number of states 

In this section, we show that the construction discussed in Section [5.4. II can be adapted to yield 
a fixed number m of states. This is quite straightforward: We simply divide the spectrum of the 
conditional operator into m different intervals /^q], for a E [to]. Instead of projecting 

onto the eigenspace corresponding to a single eigenvalue, we use projectors Pgtj onto the direct 
sum of eigenspaces associated with eigenvalues in the corresponding interval. 

Lemma 5.4 (Entropy splitting). Let pabcd = \'^abcd){^abcd\ be a pure state and let ac be 
a nonnegative operator. Let ho < hi < ■ ■ ■ < hm be an (m -f 1) -tuple of monotonically increasing 
real values with minimum and maximum given by 

ha := H{B\C)r 

h^ ■.= H{AB\C)^-H{A\BC)p . 

Then there are mutually orthogonal projectors {QAD}ae[m] with the property that 

H{A\BC)po. > H{AB\C)r - he (31) 
H{B\C)i^>ha-i (32) 

where p\bcd = Y^abcd){'^abcd\ «s defined as 

mBCD)--^QAD\^ABCD) . (33) 

An alternative expression for these states is 

mBCo) a'fP^c^c^'^\'^ABCD) , (34) 



21 



where {^bc '^'^e mutually orthogonal projectors. They satisfy 

E I*") = I*) (35) 

aG [m] 

Note that, according to the chain rule (Lemma 15.21 (pv|) ). Hq < km, i.e., there always exists a 
tuple of reals as defined in the lemma. 

Proof. The proof of this statement is almost identical to the proof given in Section [5.4.11 but given 
here for completeness. For any a E [m — 1], define /io, :— 2^''°, and let /ig :— oo, /i,,,, := —oo. Note 
that (/ia)™=o is a monotonically decreasing sequence of values. 
Consider the Schmidt-decomposition 



'^^{■^abcd) = V ^\X)bc\X)ad (36) 



A 

of the "conditional" state ct^ \^ abcd) (the sum may include multiplicities). For every a € [to], 
we define the projectors and Q'Xj^ as 

Ae]/Jc,:Pc.-l] 

By definition, these operators satisfy ([55)) and Moreover, using the fact that Q^^i commutes 

]^ /2 _____ _____ 

with cTf^ , we conclude that ([55)1 and (15^ define the same state |^'J4bci5)- 

Since P^q is the projector onto the eigenspaces of which belong to the eigenvalues in 
iMcuMct-i] for every a € [m], we have 

f-aPgQ < Pg(j Pg(j < lla-lPsC ■ (^7) 

We show that 

PSc — PSc < 2-''°-P^c for aU a e [to] . (38) 

This follows directly from ([57)1 for a > 2 since 2~''°~i = /Ja-i', for a = 1, it is a consequence of 
the fact that the eigenvalues of are upper bounded by 2~^° by definition of Hq. 

Claim dSD) now directly follows from dSSj) and the fact that ^ = -Pbc^-^bc- 
Next we show that 

pa^PABCp^^ < 2-^(^^l^)f +''°P^^^P^^ for all a G [to] . (39) 
We distinguish two cases: For a = m, identity p9p is equivalent to 

pa PABCpQ, ^ r,-//(A|BC)p pa PBC pa 



because of the definition of h„i. But this directly follows from pabc < 2 ^^"^'■^'^-'''/Obc by multi- 

— 1/2 

plication from both sides with Pj^c'^c and its adjoint. 

For 1 < a < TO, we use the fact that the first inequality of ([57)1 is equivalent to 



O^/ia pa ^ pa PBC jja 
^ BC — ^BC ^BC 
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Figure 1: This figure illustrates the basic building block for our arguments (cf. Corollary 15. 5p . A 
state l^") can be decomposed into a sum of m orthogonal states; in the figure, m = 2. We illustrate 
this by a tree; the original state sits at the root, whereas the split states sit at nodes labeled by 
a £ [m]. The state at the root is the sum of its descendants, which are identical to the leaves in 
this case. Going from a node to its descendants is achieved by applying corresponding projection 
operators. 



since 2 = /Iq. Substituting this into the inequality 

pa PABC pg ^ ^~H{AB\C)^ a 
^BC ^BC — ^ " ^BC ■ 

(which directly follows from < 2 ^ ) immediately gives (1591) for all 1 < a < ni. This 

concludes the proof of the auxiliary statement (|39p . 

The proof of (PT|) is now straightforward, based on ([5^ . Multiplying the latter inequality by 

1/2 

(tJ from both the left and the right yields 

-H{AB\C)p+h^ „ 

Pabc - ^ " Pbc ■ 

which implies (jSip . □ 

In the previous lemma, we did not specify the intervals ]ha-i,ha] that are used to partition 
the spectrum of the conditional operator ^7-- ^ simple choice is to partition the spectrum into 
m intervals of equal length. This results in the following splitting-chain-rule, which will be our 
basic tool in what follows. 

Corollary 5.5. Let Pabcd — \^ ABC d) {'^ ABC d\ be a pure state and let ac be a nonnegative 
operator. Then for any m G N 

H(A\BC)„c. +H(B\C)i^ > H(AB\C)p - — 

" m 

where 

A H{AB\C)jL - H{A\BC)p - H{B\C)r . 

and where p'Xbcd ~ \^ABCD)i^ABCD\ ^•^ defined by p3p or (I34p in terms of families of mutually 
orthogonal projectors {Q'XD}ae[m] o,''^^ {PBc}ae[m]j *^ Lemma \5.4\ 

We are usually able to obtain a bound on A; for a comparatively large value of to, we therefore 
get an approximation of (j27p . which is a converse to the recombination-chain-rule (Item (pv]l of 
Lemma 15. 2p . 

Proof. Here we choose ha — ho + a-^ for all a e {0, . . . , m}. □ 



We point out that the statement of Corollary [5T5] is also valid with B removed from all expres- 
sions. This is because we can always adjoin a trivial system B with Hilbert space Hb — C. 

For later use, we establish a few additional properties of the states I'^'Xbcd)- We first show 
that the states {"^abcd) have the same classicality properties as the original state I^'abcd)- 
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Remark 5.6 (Preservation of classicality properties). Suppose that D — D1D2 is bipartite, and 
that p ABC Di is classical on A, B, and Di (relative to some orthonormal bases of these subsystems). 
Then P^bcDi classical on A, B, and Di (relative to the same bases), for any a € [m]. 



Proof. According to the discussion at the end of Section [STT] about classical states and (IM)) . it 

□a -1 

BC"C 



suffices to show that the operator crj P^n'^c ^^'^ form 



<^'fPSc<^c'^' = T.\b){b\'»0'c , (40) 

b 



for some operators {Oplh on C, where {\b)}b is the eigenbasis of pB 

Becaus 
and D, i.e 



Because cr^^^^ acts only on C, the state a^^^^l^ABCo) is classical on B when tracing out A 



for some nonnegative operators {dQ}b on C, because this is true for the original state \'^abcd) 
by assumption. In particular, the eigenvectors of are of the form \b)B\y^)c- Since Pg(j 
is a projector onto an eigenspace of this operator, this proves that P^(-; has the form Pg^j — 
J2b \b){b\B ^ for some operators {T^jb on C. This immediately gives the claim PO)) . □ 

As explained in Section 14. 4[ we will later apply the splitting-chain-rule recursively. In par- 
ticular, we will further split up split states. Conveniently, orthogonality properties are preserved 
under such successive splitting operators, as we now explain. 

For concreteness, suppose that we split a state \^ AxBiCDi) into states {|*^jSiCr>i)}ai satis- 
fying 

H{Ai\BiC)p^. + H{Bi\C) ^> H{A^Bi\C) ^ . 

Assume further that Bi = A2B2 is bipartite. We can then split each Y^'aiBiCiDi) further into a 
family of states {|^5j'bjCiDi)}"2 such that 

H{A2\B2C)p.^c,, +H{B2\C)^^>H{A2B2\C)^^H{Bi\C)^ 
for all (ai,Q;2). Diagrammatically, the grouping/splitting of systems can be drawn as 

C C 
Ai 



Clearly, a desirable property is that these states are orthogonal, such that 



(01,02) 



AiBiCDi) 



is a decomposition of \9 a^BxCDi) into mutually orthogonal states. 

We will prove this statement by considering the corresponding projection operators {Q'a^j:,-^ }qi 
and {Qa2'd2}(qi,q2) (where D2 = -Di^i) defined by the splitting-chain-rule; i.e., these are operators 
satisfying 



24 



By definition, for every ai, the operators {Q^^'S; J'("i-"2) rnutually orthogonal for different a2- 
We will now show that these operators satisfy the inequality 



< Qa\d, ■ (41) 



for all (01,0:2). This expresses the fact that the operators Q'^"j^^ are a "refinement" of Q'^^d^- 
In particular, their images are orthogonal for different values of oi, and we have Qa^'d^Q'aiDi ~ 
Q'a^d^ Lemma [B21 fill ) • In other words, each of the states \'^'a\'^iCDi) obtained by 

applying a single projection to \^AiBiCDi)- 

The proof involves the following property of the projection operators. 

Remark 5.7 (Operator inequalities). Let pabcd = \^abcd){^abcd\, Qad^ '^^^ Pabcd = 
I'^ABCd) i^ABCol defined as in Lemma \5.4\ Let idg^pp^p) denote the projector onto the support 
of the operator p. T/iero 

idsupp(p°j3p) — *3Sr> — idsupp(pAD) 

for any subsystem F C BC. (By that, we mean that "Hbc is the product Tisc ='Hf ® 'Hg of two 
systems F and G, such that BC = FG.) 

Indeed, the second inequality of this remark gives 
because D2 — DiAi, whereas the first inequality with F — A2 (recall that Bi — A2B2) leads to 

"^Bupp(p°i^^^^) < Qa\d, ■ 

This proves the fundamental property (|^T|) . 

It remains to give a proof of the statement made in the remark. 

Proof. According to Lemma IB.2I ^ , it suffices to show that 

s^PPiPADp) ^ supp(Q^£, (g) idf) 

and 

supp((5ad) ^ supp(pAL>) • (42) 

The first of these inequalities is a direct consequence of the fact that Padf — 0'^f(!^Qad)PadfO'^p^ 
Qad)- To prove the second inequality, observe that Qad projects onto an eigenspace of the con- 
ditional operator trj^{afj^^'^\"i! abcd){'^ abcdWq^^'^), and thus 



suppiQIo) ^ supp (tr5p(CT^^/^|5'ABcr>)(*ABCD|CTc^ 



-1/2. 



The inclusion (H^ then follows because the latter set is contained in supp (pad). This can be ver- 
ified for example by using a Schmidt decomposition I'^'abcd) — ^/mP-Bc)\P'AD) of \^abcd)- 
In terms of this decomposition, we have 



i^Bci'^c^ \'^abcd){'^abcdWc^'^) ^Yj^^^'^c '^\pbc){P''bcWc )\P'Ad){p-'j 

p,p' 



and the support of this operator is clearly contained in spslii{\pad)} ~ "S^wi^PAD)- CH 



'^^Recall that, according to our convention, the first inequahty is an abbreviation for the operator inequality 
idsupp(p° j-,^ ) ^ Qad ® '^'^P ('^^^ Section I5. II for more details). 
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5.5 Recombination-rules for split states 



As discussed in Section 14.31 we will need a converse to the splitting rule which shows that the 
entropy of the original state is large if it is large for each split state. Here we show how this works 
in detail in the most simple case. Again, this section may be omitted, but it is instructive for the 
slightly more intricate case we will need below (cf. Lemma l6.7p . 

Remarkably, the statement we will prove is generally true for any system F which we do not 
condition on. 

Lemma 5.8. Let \^abcd), {|*Ssci3)}ae[m] and {Q^DlaGM as in Corollary\5lM Let F C 
ABD be an arbitrary subsystem. Then 

min H{F\C)^ - 2 log m < H{F\C)e. ■ 

a £ [m] cr ^ 

-min„e[,„j H(F\C)^ 

Proof. Let A := 2 " . We then have p'^^ < Xac for all a e [to], or 

^ < Wdpc ■ 

Using the commutativity of Q'Xd and ac, we can rewrite this as 

^rpciQAD^^^QlD) < AidFC for aU a e [m] . 

At this point, we use a statement about operators which we state as Lemma IB. II in the appendix. 
It tells us that the previous inequalities imply that 

PABCD ^ \ / \ 2- . 

where Qad = Sae[ml Q'ad- Recall that the operators Q'ad ^''^ defined in terms of the eigenspaces 
of trggr( ^'^^^^ ) ■ Their definition implies that Qad restricted to the support of tr ^^g is equal 

to the identity. Thus the last inequality simply says 

Pfc ^ . 2- I 
< Am sapc ■ 

1 /2 

Multiplying from the left and the right by tij gives the claim. □ 



6 Entropy sampling 

We now return to our main problem, i.e., the analysis of a state px^E with classical part X" = 
(Xi, . . . and the relation of the entropy H^^^{Xs\E)p of a randomly chosen subset S C [n] 

to the entropy 7J„iin(-^"|£^)p of all classical parts. We proceed as sketched in Section [HH In 
Section [6Tl we describe the recursive splitting of the joint min-entropy _ffmin(X"|£')p into a sum 
of individual contributions of each random variable. We then discuss how high-entropy components 
can be recombined to a state with high min-entropy (Section 16. 2p . In particular, we relate the 
smooth min-entropy Hf^:^^^{Xs\E) p to the probability weight lo{T) of a certain set F under a given 
distribution uj. We then study the behavior of a sampler with respect to this quantity. For this 
purpose, we introduce the concept of a parallel sampler in Section 16.31 We then show that with 
high probability over the choice of 5, the probability w(F) of interest is large (Section 16. 4p . 

We finally combine these components in Section 16.51 where we state our main result, i.e., the 
preservation of (smooth) min-entropy rates under sampling. 
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Figure 2: A schematic picture of the states introduced in Definition 16. If for n = 3 and m — 2. 
As in Figure [TJ the immediate descendants of every node give an orthogonal decomposition of 
the state associated with it, and are obtained by applying corresponding projection operators. In 
Lemma 16.31 we will show that the states at level j are orthogonal, for every level j € [n]. In partic- 
ular, this means that the leaves form an orthogonal decomposition of the original state. Observe 
that we label each vertex by the corresponding sequence of splitting operators; in particular, the 
leaves carry labels a" G [to]". 

6.1 Splitting 

We apply the splitting-chain-rule recursively to a state px^E, where Xi, . . . , X„ are random vari- 
ables on an alphabet X. Let I^^x^er) be a purification of px^E (for simplicity, we will henceforth 
often omit subscripts denoting systems, where there is no potential for confusion). Furthermore, 
let (Tb be a nonnegative operator on E. In Figure [21 we visualise the set of states introduced in 
the following definition by a tree. 

Definition 6.1 ("Split states"). Let px"^ER = |*)(*| and let Px^er = I be pure states 

recursively defined as follows. Set ) := \^) ■ To obtain for j £ [n] and — {aj,a^^^) G 

[my = [to] X [to]-'"^, apply Corollary 15.51 to the state I^P"^ ^) with A = Xj, B — X^j, C ^ E, 
D = X<j^iR. This gives projectors and Qx^ Rj ^6 

define l^-" ) as in Corollary 15.51 as 

Spelling out this recursive definition, we have 

l*"'HQl,/?---Qx<,/?,l*) (43) 
^PxI,e---PxI,e\^) . (44) 

~ j 1/2 j 1/2 

where Px^ e ~ '^e Px^ e'^e ■ "^^^ following auxiliary result will prove useful. We will apply 
it to show that the states on each level of the tree in Figure [2] are mutually orthogonal (by level, we 
mean all vertices at a fixed depth of the tree, i.e., distance from the root). In fact, any two states 
in different subtrees are mutually orthogonal, but we will not need this statement here. The proof 
of the following lemma relies on the fact that splitting preserves orthogonality. It is analogous to 
the derivation of (PT|l in Section [5^ 

Lemma 6.2. For all j > k and e [to]-' we have 

VX<jflVx<fcfl — Wx<i,R^X<jR — ^X<jR ■ 

Moreover, the operators {Qx^jR}aie[m]j are pairwise orthogonal for a fixed j € [n]. 
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Proof. Note that the first claim trivially holds for j — k since the operators are projectors. Observe 
that for any j > 1 , we have 

where we used Remark 15 . 71 twice (with F = Xj). Inductively, we obtain 

for any k < j. The first claim therefore follows from Lemma lB.21 

The orthogonality of the operators {Qx< flJ^"^^!™]^ immediately follows from the first claim: 
For ^ f3^ [m]-' , let fc < j be the minimal index in which they differ, i.e., ak ^ Pk and 
a'^'^i = /3^^^. We then have by the first claim 

Qx<jB.Qx<jB. = Qx<jB.Qx^kR^X<kR^X<jB. ^ ^ , 

since the operators Q^^^n — Q'x^Cr ' Qx^^R ~ '3x<li? ^ orthogonal for ak ^ Pk- □ 

As promised, we now establish a few properties of the split states such as their orthogonality 
and the fact that they are partly classical as the original state. 

Lemma 6.3 (Properties of the split states). The states introduced in Definition \ 6.1\ have the 
following properties. 

(i) The states {|*" )} aie\m]i '^''6 pairwise orthogonal for a fixed j G [n]. 

(ii) The states {l^*" )'\a"e[m]" form an orthogonal resolution of\'^), i.e., X]Q"e[m]" 1^" ) ~ l*^)- 
In particular, Lo[a^) :— trj^*" | defines a probability distribution on [m]". 

(Hi) The state l^*"^) can be obtained by a single projection on X<jR, i.e., I^*"^) = Qx.^ rI'^) ■ 

(iv) For every j G [n\ and G [my , the state Px^^e classical on X". 
(v) For all a = oe, we have H{(d\E) > H{{I)\E)r. 

The probability distribution oj (introduced in ^) on the leaves [to]" of the tree in Figure [2] 
will play an important role in our recombination step. Inequality (jvj) can be seen as an expression 
of the fact that splitting does not affect the part we condition on. 

Proof. First observe that (pii| follows inductively from Lemma 16.21 and expression (j43p . Similarly, 
the orthogonality follows from this lemma and ijiu)). Statement (juj follows by induction over 
j from Statement (jlv| follows inductively from Remark 15.61 applied with Di = X<j^i and 

D2 = R. Finally, the claim (jv| directly follows from full and Lemma [5.31 (with C ~ X<j^i and 
B = E). " □ 

The main reason for introducing the split states {l^*"^)} is the fact that they allow us to split 
the joint entropy H{X"'\E)p into individual contributions according to the splitting-chain-rule 
fCorollarv l5.5p . We express this central result as follows. 

Theorem 6.4 ("Splitting"). The split states satisfy 

HmE)^+j2H{X,\Xy,E)^^, > H{Xy,\E)^ ~ ^i^^l . (45) 

for any a" G [to]". 
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Figure 3: The tree T„ = T3, for m — 2 and n = 3. Every path from the root to a leaf/spade 
is specified by an n-tuple a" € {1,2}'^. We wiU attach a weight corresponding to an entropy to 
every edge in the graph). 

Proof. In the foUowing, we sometimes refer to the empty set as X>„. By construction and Corol- 
lary [131 the split states satisfy the inequalities 

H{X, |X>,i?)^„. + H{X>j \E)^ > H{Xy,^,\E)^ ~ ^ for aU j e [n] , (46) 

where Aj = H{Xyj^i\E) — H{Xj\XyjE) ^^j-i — H{Xyj\E) ^^j-i . Summing these inequali- 
ties over all j E [n] , we get 

^ H{X,\X^,E)^^, > J2 iH{X^,-,\E)^ HiX^,\E)^) - ^ ■ 

■ r 1 ■ r 1 tr cr ^ '''' . r , 

Because the rhs is a telescoping sum, i.e., 

J2 (H{Xy,^,\E)^ - H{Xy,\E)^) = H{Xya\E)^ - H{X^^\E)^ , 

this gives 

H{X^^\E)^ + ^ H{X,\Xy,E) > H{Xyo\E)^ - ^ ■ (^7) 

Note that Px'^e classical on X", according to Lemma [6.31 (pv[) . We can therefore use the 
dimension bound jul of Lemma 15.21 and the positivity of the min-entropy (Lemma 15.21 for 
classical systems to get 

loglA-l > H{X,X^,\E)^ - H{Xy,\E)^ 

> H{Xy,.,\E)^ - H{Xy,\E)^ - HiX,\X^,E)^^,-. = A, for all j e [n] . 

The claim follows from this and ([T7[) . □ 

To put the statement of Theorem 16.41 into a more concise form, it is useful to think of the 
entropic quantities appearing on the Ihs of the inequality ([15|) as attached to the tree given in 
Figure [D For convenience, we use a slightly modified tree T„ which has spades attached to the 
leaves of the original tree (see Figure [3]) . 

We can then attach weights to the edges of T„ according the rule Vp given in Figure [4] For a 
path a" e [to]" from the root to a spade (i.e., leaf), we define the weight Vp(Q;") of the path a" 
as the sum of the values on the edges along this path. In particular, for the weighting Vp specified 
by Figure [31 the weight Vp(Q!") coincides with the Ihs of ([IS]) in Theorem 16.41 
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for all i e [n]. 



Figure 4: The weighting Vp of the edges. 

More generally, we slightly abuse notation and define the value w(T) of a tree T with weighting 
w as the minimal value of a path from the root to a leaf. Theorem 16.41 can then be reformulated 
as follows. 

Theorem 16.41 . Let T„ be the tree introduced in Figure\^ and let Vp be the weighting specified by 
Figurem Then Vp(T„) > H{X>o\E)^ - ^ log \X\. 

We will later be interested in different weightings. We will also show a converse to this state- 
ment: If the value of a tree is large, then so is the corresponding entropy. 

6.2 Recombining 

To show that the original state px^E has a large smooth min-entropy H^-^{Xs\E) for a randomly 
selected subset S C [n], we will now study how the split states can be recombined. More pre- 
cisely, we are interested in properties of states j^*) that are obtained by summing up states j^*" ) 
corresponding to a subset F C [to]" of leaves of the tree in Figure [2l 

In Section 16.2.11 we discuss how such a recombined state can be defined recursively, starting 
from the bottom of the tree. We then use the corresponding intermediate states in Section [6.2.21 
to analyse how a judicious choice of F yields a recombined state \^!) with a large min-entropy 

Hniin{Xs\E)p. 

6.2.1 Partially recombined states and properties 

We are interested in properties of the state 

1$) = J2 I*"") (48) 

obtained by summing over a certain subset F C [to]" of paths. To analyse such a "partially 
recombined" state, we will consider intermediate states attached to a tree. The state l^") will sit 
at the root of the tree. We will refer to it as l^*) = [5'" ) in the following definition, which we 
illustrate in Figure [51 

Definition 6.5 ("Recombined states"). Let F C [to]" be arbitrary, and let [*"") for a" £ [to]" 
be the split states introduced in Definition \6.1\ We define the recombined states 

1$"^) ^ ^ |vl;7") 
7"er 

7^ — 

and let Px'^er ~ ['I'"^ ) (^'"^ | for all e [to]-' . For simplicity, we omit F in the notation. 

Not surprisingly, the recombined states inherit many properties of the split states. The follow- 
ing lemma summarises these, and is the analog of Lemma 16.31 



H{X>^\E)^ 

o 

for all i £ {0, . . . ,n}. 
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Figure 5: Here we illustrate the partially recombined states of Definition 16.51 for n = 3, m = 2 
and r = {112,211,212,222}. We again associate every state \'^" ) with the node carrying the 
label S [my . We start by defining the leaves, i.e., the states l^""") for a" £ [m]": For a" G F 
(illustrated by triangles), we use the same leaves as in Figure [21 i.e., we set j^*"") — I^Jf""). On the 
other hand, we set l^*" ") = for a" ^ F. We then work our way up the tree, defining the state in 
each node as the sum of its immediate descendants. The elements at the dotted nodes are equal to 
zero, whereas for example |$^) = + = |*^") + |*^^^) + 1^*^^^). Clearly, the state at the 
root is equal to the sum of the leaves in F, i.e., I^*) = J2a"er 1^"")- We will show in Lemma 
that the states at any given level are orthogonal, and that movement in this diagram is achieved 
by the same projection operators as in the tree of Figure [21 Moreover, the entropies of interest 
corresponding to this modified tree are at least as large as those corresponding to Figure [H 



Lemma 6.6. The recombined states have the following properties. 

(i) The states {|*" )} QjefmlJ o.re orthogonal for a fixed j € [n]. 

(ii) The states form a resolution o/|$"''"'), i.e., Ea,6[rn] I*"') = 

(Hi) The states satisfy the recursion relation l^*"^) — Q'x^ r\^"^ ) for all j e [n] and e [my . 

(iv) For every j = 0, . . . , n, there is a projector T^^ ^ such that |^"^) = rY^) ■ Iri particu- 
lar, for a = as arbitrary, we have H(%\E) > H{%\E)p.. 

(v) We have H{Xj\X>jE)^^, > H{Xj\X>jE) for all j e [n] and £ [my. 
(vi) For all q" e [m]"; we have H{%[E)^ > H{^E)^. 

The recursion relation jml will be most important in our analysis. It provides a means of 
studying properties of the corresponding states in a recursive manner, moving up the tree in 
Figure [H to the root. 

Proof. The orthogonality (|T| of the states is a direct consequence of the orthogonality 

of the states {l^*^ )}'y" (cf. Lemma 1^751 (p])). Identity also follows from the definition of l^*"^). 
For the proof of jm| , observe that 



) if 7^' = 

otherwise , 



by Lemma Applying this to compute Qx< fll*"^ ^) immediately gives the claim ([m 
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Defining rx<„_R = Y.rer Qx^^r and using the fact that |$"') = Y.i^er 1*'^") and I*) = 

J2a" 1^" ) proves the first part of ([iv| because of Lemma [6.21 and Lemma [6.31 The second part 
of ([iv]) follows from Lemma [?751 

Next we prove (jvj). Note that the statement holds trivially for j — n and a" € F, since 
in this case |$"") = 1^-"") by Definition O If j = n and a" ^ F, then |$"") = and 
H{Xj\XyjE)-^^j = CXI by definition, hence (|v|) also holds in this case. Assume now that j < n. 

We have = Rx^,^,e\^''') for the operator Rx^,^,e = E 7"6r ^£„b ' ' ' ^C+i^' 



Definition 16.51 and (|44p . The claim (jvj) therefore follows from Lemma 15.31 

For the proof of statement (jvl|, we again use the fact that l^*"") = |^"") if a" G F and l^*"") — 
otherwise. In particular, we have H{%\E)pa,'^ = H{%\E) pa"- in the former and H{%\E)pa,n = oo 

in the latter case. Hence the claim (jvi|) follows. □ 

We next prove an analog of the basic recombination lemma (Lemma 15. 8p for the partially 
recombined states Ivt" ). In terms of the position of the corresponding states in the described 
tree, it expresses the fact that the entropies of interest do not decrease significantly when we move 
from one level up to another level closer to the root. 

Lemma 6.7. For all A C [n] (possibly empty), i G [n] and a^~^ G [m]'"-'^ 

min H{XA\E).p.^, - 2\ogm < H{X_a\E) ^^.-^ . 



-min„.g[„j_H(X^|E)^^i 

Proof. Let o;*"^ € [m]*~^ be fixed and let A := 2 , where a* = (ai,Q;*~^). By 

definition 

— 

< Aidx^B for aU e [m] . (49) 
To relate this to p%^E^ the recursion relation (juH) of Lemma IHIHj to rewrite (jUj) as 

trx7^(Ql.«^^Qx<.ii) < Aidx.ij for aU a. G [m] . 

^ E 

Lemma FB. II thus implies 

^'xJe{Q^^^^Q) < Am^idjf^B , (50) 

(7E 

where Q = Ea.e[m] Qx<,b.- But 

where we used dm]) and ^ of Lemma 16.61 Inserting this into gives 

/^X^E / \ 2-1 

— ^ < Am idx^_E , 



-1/2 



which concludes the proof. □ 
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I H{X^\X>^E)^^^ 




for i ^ S. for i ^ S. 




o 

for all i e {0, . . . , n}. 



Figure 6: The weighting of the edges of T„. The weighting is defined analogously, with p" 
replaced by p" . 

6.2.2 Recombining high-entropy components 

We now study the entropies associated with rccombined states, in the special case where F C [to]" 
is chosen as the set of "high-entropy paths" for a subset S. Our main result of this section is 
Theorem 16. 131 which expresses the fact that the corresponding entropy H^{^{Xs\E)p is large. 

Let us fix a subset S C [n]. We will be interested in the entropies of variables Xi with 
i € S. That is, we consider the weighting defined by Figure [H] of the tree T„ introduced after 
Theorem 16.41 A given path a" G [to]" in T„ then has weight 

(T„,a") = i/(0|ii;)^ +^i/(X,|X>,£;)^„. 

by definitiorF^. We cannot expect this to be large for all a" G [m]"; in particular, the value 
v^(Tji) will in general be small. We therefore introduce the following sets. 

Definition 6.8 ("A-good paths"). For A > and S C [n], letr{X,S) C [m]" be the set of n-tuples 
a" G [m]" with 

\S\log\X\ 

We call F(A,5) C [to]" the set o/ A-good paths for S. 

The choice of the normalisation factor [iSlloglA"! will become clearer in the sequel when we 

relate the quantity to the entropy-rate "^f^Xs) = \s\fol\x\ - 

Let us consider states that arise when recombining only A-good paths. That is, we fix A > 0, a 
subset S C [n] of size |5| = r, and let F = F(A, S) be the set of n-tuples specified by Definition l6.8l 
We then define the partially recombined states {p"'} as in Definition 16.51 

Note that the recombined states give rise to a weighting of the tree T„ as in Figure [Sj 
Contrary to the original weighting v^, this weighting assigns a large weight to every path. That 
is, we have the statement 

Lemma 6.9. v|(T„) > Al^lloglA'l . 

In other words, when considering the recombined states, all paths are A-good. This is not the 
case for the original split states. 

Proof. Suppose first that a" G F(A,5) C [m]". Then 

H{Xj\XyjE)^^, > H{Xj\XyjE)^^j for all j G [n] by Lemma ES] (|v|) and 



H{%\E)^ > H($\E) by Lemma 



Observe that we now explicitly mention the dependence on the tree T„ in Vp(T„,a"), as we will be dealing 
with several different (sub)trees. 
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for i ^ S. 



H{Xi \X>tnsE).c,i 




1^ 
for i e 5. 




H{Xyins\E) ai 



o 

for all i e {0, . . . , n}. 



Figure 7: The weighting w^. 



<> 



Figure 8: The tree Tq 

This directly gives v|(T„, a") > v^(T„, a") > X\S\ log \X\ for a" G r(A, 5). On the other hand, if 
a" ^ r(A, 5), then we have p"" = which imphes that i/(0|£') = oo and thus v|(T„, a") = oo. 
The claim follows by taking the minimum over a" £ [to]". □ 

Next we apply subadditivity, to go from the weighting v~ defined by Figure [6] to the weighting 
introduced in Figure [7] This weighting assigns the weight 

wf (T„,a") = HmE)i^ +J2H{X,\X>,nsE)p^. 
' oes 

to a path a" in the tree T„. We then have the inequality 
Lemma 6.10. w|(T„) > v|(T„). 

Proof. With subadditivity fLemma l5.2l (lull) '), it is straightforward to show that 

H{Xj\XyJr^sE)■j^j > H{Xj\XyjE)-^j 

for all j E S and E [my . The statement follows immediately. □ 

Our aim is to show that if every path is A-good for some A, then the entropy H{Xs\E) is 

large for the recombined state p"" . This expression can be seen as the value of the tree Tq which 
is defined in Figure [51 i.e., we have 



wf{To)^H{Xs\E)^^o 



(52) 



To obtain an estimate on this quantity, we use a sequence of intermediate trees and show the 
following: 

Lemma 6.11. There is a sequence T„_i, . . . ,Ti of intermediate trees such that 



wf (T,_ 1 ) > wf (T, ) - 2 log TO for all j E [n] 



(53) 
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To obtain Tj_i from Tj, the subtree The tree T2 obtained by applying the 
defined by a vertex a^~^ at level j is substitution rule to the tree T3 of Fig- 
substituted as shown, for all a^^^ e ureO 
[m]^~^. Note that the vertex a-'"^ has 
(in general) m direct descendants; the 
figure corresponds to m — 2. 

Figure 9: The substitution rule for obtaining Tj_i from Tj, j € [n]. The tree Tj has depth j + 1, 
with spades sitting on the j + 1-st level. 



where Tg is the tree in Figure\^ and T„ is the original tree (see Figure\Bj). In particular, 

wf (To) > wf (T„) - 2nlogm . (54) 

Here X>jn5 denotes {Xi)i^s^iyj (this is equal to if j > n). In these expressions, the value of 
the tree Tj is equal to 

w^(Tj) = min 'wf(Tj,a-') where 
wf (T„,a-'") = H{X>jns\E)^+Y,HiX,\Xy,nsE)p^, . (55) 

Proof. Note that (f54|) follows immediately from ([53|l . 

We first define the sequence of trees T„_i, T„_2, . . . , Tq. We do this inductively as shown 
in Figure [HI that is, we obtain Tj_i from Tj by substituting subtrees corresponding to vertices 
a^^^ G [m]-'^^. Clearly, Tj is a tree characterised as follows: For every < k < j, every vertex at 
level k has m immediate descendants, whereas each vertex at level j has one descendant which is 
a spade. 

The tree Tq defined recursively in this way coincides with the definition given above (Figure [8|). 
Also, it is easy to sec that the value of the tree Tj is given by ([55]) . We prove the central 
inequality ([55]) . 

By definition, it suffices to prove that for all a^^^ E [m]-'^^, there is an aj E [m] such that 
wf (T,_i,a^-i) > wf (T„a^") - 21ogm , 

or equivalently 

S = min wf (Tj,aJ') - wf (Tj_i,aJ'"^) < 21ogm . (56) 

Q j e [m] 

Since the two paths to the vertex a^^^ are identical in Tj and Tj_i, the expression on the Ihs is 
equal to 

<5 = wf(A)-wf(S) , (57) 
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where A and B are the subtrees defined by ^ on the left in Figure El 
By definition, we have 



wf(A) 



min„^e[„] H{X^jns\E)^ if J ^ 5 

niin„^.e[m](-ff(^>jn5|£^)^ + i?(X,|X>,n5-B)^„, ) if j £ 5 



We thus have to consider two cases. 

(i) If j ^ S, then X^jds = Xyj^ios and (|56p follows with ((57)) once we show that 

mmH{X>jns\E)^ - H{X>jns\E)^^ < 21ogm . 

This was shown in Lemma 16.71 

(ii) If j G S, we have 

<5= min (H{Xy,ns\E),^. + H{X,\Xy,nsE)^, - H{X^,_,ns\E) 

By the chain-rule (Lemma 15.21 fivl) ). we have (observe that Xyj^ms = XjXyjos) 
HjX^jnslE) + H{Xj\X^jnsE)~ai < H{X^j^ins\E) ^ 

and thus 

6< min ('ff(X>,_in5|i?),.. -ff(^>,-in5|i?),..-i) ■ 

aje[m\ \ -s- 5 — / 

The claim ^ again follows from Lemma 16.71 
In summary, we have shown the following: 



□ 



Lemma 6.12. Let S C [n] be arbitrary and let T{X, S) C [m]" be the set of X-good paths for S as 
in Definition \6.8[ Let {/o"^} be the corresponding partially recombined states as in Definition \6.5\ 
Then 

\S\\og\X\ - \S\\og\X\ ■ 

Proof. We have 

H{Xs\E)^=^f (To) hym, 

> wf(T„) -2nlogm dH, 

> v|? (T„ ) — 2n log m Lemma 16.101 and 

> Al^lloglA"! - 2nlogTO Lemma 

□ 

We have shown that when recombining only A- good paths, one ends up with a state with high 
entropy on the subset S of systems of interest. The recombined state can, however, be far from 
the original state, if only a few paths are A-good (or more precisely, if the share of the A-good 
paths is small). We express this as follows. 
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Theorem 6.13 ( "Rccombining" ) . There is a probability distribution uj on [m]" such that for any 
subset S C [n], there is a subnormalised state px^ER with 

H{Xs\E)^ 2n\ogm 
\S\\og\X\ - ~ l^lloglA-l ' 

HmE)p > HmE)R 

at distance 



from the original state px"ER, where T{X,S) C [m]" is the set of paths a" € [m]" such that 

H{%\E)^ + H{Xj\X>jE)^^, > X\S\ log \X\ for all a = a" £ r(A,5) . (58) 

Proof. Let u; be the probability distribution introduced in Lemma [6.31 We set p — fp" equal to 
the partially recombined state 

The first bound was derived in Lemma 16.121 The second bound is identical to the claim (|iv)) 
of Lemma 16.61 for j — 0. For the bound on the distance between p and p, we use the fact that 
p" — QpQ, where Q = J^-y^er Q1<:<„r ^ projector (cf. Lemma IST^ . Applying the gentle 
measurement lemma |Win99[ ION02j 



-||p — QpQW < V tr(p) — tr{Q'^p) for all subnormalised p and < Q < id 
gives the claim. □ 



6.3 Averaging samplers and parallel samplers 

To argue that an averaging sampler picks A-good paths with high probability, it will be necessary 
to analyse the behavior of a sampler with respect to values attached to a tree. For simplicity, we 
consider an even simpler situation (which is more general and sufficient for our purposes): We 
think of values arranged in a matrix, and introduce the concept of a parallel sampler. 

Consider a modified sampler situation, where instead of a single vector = (/3i, . . . ,/3„) S 
[0,1]", a family = {(3f , . . . , (3^) € [0,l]"}„g[M] of M vectors is given. We would like to 
approximate the values ,5" = ^ A" simultaneously by expressions of the form J2ies Pf- 
Clearly, a single (small) subset S C [n] will generally not give a good approximation for each one 
of the M vectors. However, it is possible to guarantee that it does so for most vectors, in the 
following sense. 

Definition 6.14. Let M,n £ N. For any subset S C [n], matrix (3 = (/5r)ae[M],jeH ^ [0, 1]*^^" 
and ^ £ [0, 1], let 6(/?,5,^) C [M] be the set of a £ [M] such that 



1 1 " 



A (M, n, ^, S, e)-parallel sampler is a distribution over subsets S of [n] with the property that 
for every fixed probability distribution lu on [M], 



Pr [a;(B(/3,5, 0) > S] < e for all /3 = (/3r)„e[Af],.eH e [0, 1] 



M xn 



A {n, ^, (5, e)-parallel sampler is a {M, n, ^, S, e)-parallel sampler for any Af e N. 
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Clearly, a "standard" sampler corresponds to M = 1. In our application, the matrices /? G 
[0, will not be arbitrary, but have a lot of redundancy. This could perhaps be exploited to 

find better constructions; however, for our purposes, a parallel sampler is sufficient. 

We now use Markov's inequality to obtain the following generic construction of a parallel 
sampler; again, more optimal constructions may be possible, but the following one is sufficient for 
our considerations. 

Lemma 6.15. A {n, ^, e)-sampler is a (n,^,^,^) -parallel sampler. 

Proof. Let A/ G N be arbitrary. Fix a probability distribution uj on [M] and let /? = (/3f )Q,g[M],ie[n] € 
[0,1]*^^" be arbitrary. Since the probability on the Ihs of ^ is bounded by e for each vector 
, . . . ,/3") with a E [M], it is also bounded if we choose a independently according to oj. That 
is, we have 

e > Pr [a e B{f3, S, 0] = E \Pr[a G B{f3, S, 0] 

Markov's inequality Pr[Z > c] < E[Z]/c with c y/e applied to the random variable Z{S) — 
Pre [a G B{S)] immediately gives the claim. □ 

6.4 Sampling A-good paths 

We now apply the concept of a parallel sampler to the situation of interest. Recall Definition 16.81 
of the set r(A,5) C [m]" of A-good paths for every A > and S C [n]. We show that for an 
appropriate choice of A, and a fixed probability distribution uj on [to]", the weight of the A-good 
paths for S is large with high probability if 5 C [n] is a random subset which is a parallel sampler. 

Theorem 6.16 ("Sampling"). Let uj be an arbitrary probability distribution on [m]". Let Pg be 
a probability distribution over subsets of [n] which is a (ji., 6, e) -parallel sampler. Then 

Pr [w(r(A, S))>l-S]>l-e for 
s 

_ g(X>o|g). n-\S\ 1 

nloglA-l +|5|nlog|A'|^^®'^^^"^TO+^^ ' 

where r(A,iS) is the set of X- good paths as in Definition \6.8[ i.e., the set of a^^ G [to]" with 

77(0|i?)^+^i/(X,|X>,£;)^„. >A|5|log|A'| . 

' jes 

Proof. For every j G [n] and a" G [to]", we define the quantity 



logjA-l 



(Note that this depends only on the first j entries of a".) Observe that we have /3" G [0, 1] by 
the dimension bound (Lemma 15.21 (jii|). By Definition 16. 141 of a parallel sampler, we therefore get 



PjHB{f3,S,0)>S]<e , (59) 



where 



B{P,S,0 = {a" G [to]" I ^ E^"" ^ ^ E - ^} 



Inequality (|59p can be rewritten as 



Pr 

5 



iuiB{P,S,0)>^-S >l-e, (60) 
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where we write B{f3,S,S) = [m]"\;B(/3, 5, ^) for the complement of B{(3,S,£). 
Note that if a" G B{(3,S,i), then 



\s\ 



n V ''TO 



by the definition of jdf and Theorem 16.41 This is equivalent to 

v^(T„,a") =iJ(0|i?)^ +Ei?(X,|X>,i?)^„. > A^'VlloglA-l (61) 

where 

7J(X>o|£;)p 1 n~\S\ 
71 log I A: I TO |6|7ilog|<Y| 

We use Lemma 16.31 ([yj) (with j = n) to bound the second summand from below, getting A" > A 
for all a" e [m]". With we conclude that 



v^(T„,a") > Al^lloglA-l for all a" S S(/3,5,C) • (62) 



In other words, we have B{P,S,S,) C r(A,5), and the claim follows from (|60p . □ 

6.5 Sampling and recombining: preservation of smooth entropy rate 

We will now turn our attention to the smooth min-entropy, as introduced in [RenOSj . We will 
state and prove our main result in this section; that is, we will show that smooth min-entropy rate 
is preserved under sampling. 

Before discussing our main result, we quickly review an important special case: We will often 
consider situations where a random variable Z = f{X, Y) is the result of applying a function to 
two random variables X and Y . An example of this is the case where Z is a randomly chosen 
substring oi X. To show that the uncertainty about f{X,Y) is large given Y and a quantum 
system E, it suffices to show that with high probability over Y, the uncertainty about f{X, y) is 
large. This is expressed by the following result. 

Lemma 6.17. Let pzYE be such that 

Vr[Hf^^{Z\E,Y = y)>k] >l-e. 
y 

Then H^J^^^{Z\YE) > k. 

The proof of this lemma is deferred to Appendix |^ 

Recall that the smooth min-entropy-rate R^^^-^{A\B)p is defined as in ([5]) as the smooth min- 
entropy i/^ji,-j(A|i?)p divided by the size Hq{A) of A. Our main result is the following 

Theorem 6.18. Let px^E be a quantum state where X" — {Xi, . . . , Xn) on A"" is classical. Let 
S be a random variable over subsets of [n] which is independent of X^^E and a {n, ^, S, e) -parallel 
sampler. Assume that k ~ < 0.15. Then 

R'J!+'+''+^(Xs\SE), > Rl,,,{X^\E), - A where 

A = e+^^ + 2.1ogl/., 
nloglA"! 

for all 9,T>0. 
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We will give concrete parameters below, which show that A ^ (in some security parameter) , 
in situations of interest. To put this result into a more convenient form, we choose a certain value 
of 0, and show how this result applies to general samplers. 



Corollary 6.19. Let px"E be a quantum state as in Theorem \6.18\ and let S he a {n,£^^e)- sampler. 
Assume that k = .g, ," , < 0.15. Then 

RitJiXs\SE)p > i?,;i„(X"|i?)p - 3C - logl/Ac with 

for all r > 0. 

Proof. We choose d = 2-^"^°s\^\. We can then bound A in Theorem [131] by 

A < 3^ + 2Klogl/K , 

and the claim follows from the fact that a (n, ^, e)-sampler is a (n, ^, y^, Y^)-parallel sampler(i.e.. 
Lemma 16.15^ . □ 



In the remainder of this section, we prove Theorem 16. 181 We do so in two successive steps. We 
first show that sampling preserves the entropy rate of a modified (smooth) entropy h^^^{A\B). 
We then use the fact that this modified entropy h.f^^^ is essentially equivalent to the smooth 
min-entropy. More precisely, we introduce the quantities 

^min 

{A\B)p = sup H{A\B)fL 

cb>Pb 

hf„i„(A|B)p = sup hmin(A|B)p 

PAB-\\PAB—PAB II <£ 
i'{pAB)<l 

for any bipartite state pab and £ > 0. The only difference to the original definition of the (smooth) 
min-entropy H^^^{A\B) p (Definition ([25]) 1 is that the supremum is restricted to states as which 
are bounded from below by ps- These quantities give the bounds 

i^l:-n.\A\B)p + 2 log 1/e > HlMB)p > hL„(A|i?)p for aU s,S>0 (63) 

on the smooth min-entropy, for all states pab- Note that the second inequality follows trivially 
from the definition; we give a proof of the first inequality in Appendix \X\ fLemma lA.ip . 

We are ready to combine the recombination theorem (Theorem l6.13p with the sampling theorem 
fTheorem l6.16p . This gives the following main result, which shows that the min-entropy rate (for 
the modified entropy h^^^^-^^{A\B)p) is preserved under sampling. 

Lemma 6.20. Consider a quantum state of the form px^E where A"" = {Xi, . . . , Xn) on X" 
is classical. Let be a probability distribution over subsets of [n] which is a (n, 5, e) -parallel 
sampler. Then 



Pr 

5 



\i^^,{Xs\E)p ^ h.„i„(X»|g)p ■ 
l^lloglA-l - nloglA-l 



> 1 — e where (64) 
1 2n log m 



m \S\\og\X\ 

for any m G N. In particular, inequality (|64p is true for the choice 

c = ^ + 2Klogl/K 
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Proof. We first show the second part, assuming that the first statement is true. It is obtained by 
choosing a specific value of m G N. Consider the function /(m) := ^ + 2k logm. We are interested 
in the minimum value of this function for m € N. It is easy to see that the function is minimised 
for mmin = where In denotes the natural logarithm. However, since this is not necessarily an 
integer, we use the value f{j^ ■ m^nin)- Clearly, for k small enough (k < 1/2(1 — ln2) « 0.15), 
there is an integer toq G ["^min, -j^ • "^min], and this integer satisfies 

f{mo)<f{^)^2K\ogl/K. 

The second claim immediately follows from this. 

We rephrase the first claim more explicitly: We have to show that the following holds with 
probability at least 1 — e over the choice of S. There is a subnormalised state px^ er (depending 
on iS) at distance 

■^Wpx^er - PX"Er\\ < 

from the original state px^ER and a state aE > Pe (which happens to be independent of S) such 
that 

H(Xs\E), > ^h,nin(X>o|S)p-|5|log|A'|(- +0-271 logm 
71 m 

H{%\E)ji > . 

Let <7e be a state which achieves the supremum in the definition of h^in(XyQ\E) r , i.e., we have 
hmin(^>o|-E^)p — H(X^q\E)r and aE > Pe- Then i/(0|£') p > by definition, and we can bound 
the quantity A in Theorem 16. 161 by 

71 log I A"! ra ' 

According to Theorem l6.161 the set r(A,5) of A-good paths has weight at least 1 — 5 with respect 
to the distribution lo (defined by Theorem I6.13P , with probability at least 1 — e over the choice 
of S. The recombination theorem (Theorem I6.13P therefore guarantees the existence of a state 
Px^ER. with the required properties, except with probability e over the choice of S. □ 

We can now prove our main result. 

Proof of Theorem \6.18l First observe that Lemma l6 . 201 can be adapted using the triangle inequal- 
ity to include an additional parameter 7 > 0, thus replacing the probability in question by 



Pr 
s 



i^'nfn^'iXslE), ^ hl,^^{X-\E), 



\S\\og\X\ - nlog\X\ 



Setting 7 = 20 + r and A ~ c + , this implies that 



> 1 



Pr 

5 



HliI+''+^{Xs\E), ^ Hl,,^^{X-\E), 
\S\\og\X\ - n\og\X\ 



> 1 



because of the relations between hmin and i/min- The claim of the theorem follows from this 
inequality and Lemma l6.17l □ 
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Samp(Z, r, S): 

Input: i-bit string Z, parameter r and [log i^^J )] independent random bits S. 
Output: Lr'/^-hii substring of Z. 

Procedure: Partition Z into n — j blocks Z = {Xi, . . . ,X„) of < = -^^^ bits each. 
Use random bits to pick a subset S C [n] of size = r at random. Output Xg, i.e., 
the concatenation of the corresponding blocks. 



Figure 10: The subprotocol Samp. We slightly abuse notation by identifying the random bits with 
the subset we choose at random. 



7 Recursive sampling and the bounded storage model 

We will now consider the random subset sampler and analyse recursive sampling from a string. 
This will result in a concrete protocol for the bounded storage model which achieves significant 
key expansion. 

The basic building block is the subprotocol Samp described in Figure [TOl This protocol outputs 
a random substring of a given string Z . The effect of protocol Samp is the following. 



Lemma 7.1. Let r be fixed, let pzE be a quantum state where Z is an L-bitstring with L > r'^ . 
Let S be an independent, uniform [log i^^^ )] -bit string. (In particular, these are less than rlogL 
bits.) Let Z' be the L^/'^-bitstring Z' = Samp{Z,r,S). The 



RitJ{Z'\ES), > R^^,^{Z\ES), - 5^ , 

for all T>0, where e' = 5 • 2'^/^. 

Note that we could have used any (ti, ^, e)-sampler in place of the subset sampler. However, for 
concreteness and simplicity, we restrict our attention to this sampler; in practice, more efficient 
constructions may be used, and the analysis is analogous. 



Proof. To prove the bound rlogL on the number of bits consumed, we use the inequality < 
on the binomial coefficients. It implies that 

[log (''^'^% < [log(Li/4e)n < r(i^ +loge) < rlogL . 



We express everything in terms of the number r of subblocks we sample, and the length L^^^ 



of the final string. That is, we sample r < n subblocks from n = -j — L^/'^r blocks of size t — ' 



bits each, obtaining a substring of rt = L^l^ bits. 

We know from Lemma [2.21 that a randomly chosen subset S of size r < n is a (n,^, e^''^ /^)- 
sampler, for every ^ e [0, 1]. We choose ^ — such that e"*"^ = e^^l"^ . The parameter e' in 
CoroUarv 16.191 then takes the form 



e' = 2 - 2^«"^°sl'^l +3( 



-v^/2U/4 



< 5 • 2-^/« . 

Similarly, we have k = -j^ry?- Because the function k i— > 2Klogl//c is monotonically increasing for 
small enough k, its value is maximised for small values of L; that is, we can use our lower bound 
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ReSample(Z, /, r,S): 

Input: i-bit string Z, parameters r and / and independent random bits S. 
Output: L(3/4)^bit substring of Z. 

Procedure: Let Z^"^ = Z. Iterate the following, for z = 1, . . . , /: 

Use independent random bits 5*^*-' from S to generate Z^'^ = Samp(Z('^^\ r, iS*^*-*) 
Output ZW). 



Figure 11: The protocol ReSamp. It calls the subprotocol Sample / times, each time producing 
a substring of the already generated string. We will determine the amount of randomness this 
protocol needs below. Note that the output of this recursive protocol can be computed with 
limited storage. In particular, it is unnecessary to store the intermediate substrings Z^*^ 



L > r"* on L get 



We conclude that 



r L^/"^ loEr 

2Klogl/«: = 2— -log <2^ 

L^i-^ r r 



3 2 logr 
3^ + 2«:logl/«<-^ + -logr<5^. 

The claim then follows from Corollarv l6.19l □ 

Note that the length L is only reduced to L'^/^ by the protocol Samp. To reduce the length of 
the output even further, we use the protocol recursively. That is, we randomly sample substrings 
/ times, each time sampling a substring of the already obtained string. In each step, the orig- 
inal string is partitioned into a certain number of blocks, out of which r are chosen at random 
(throughout, r will be a fixed parameter). We call the resulting protocol ReSamp; see Figure [TT] 

To understand the effect of this recursive protocol, observe that the quantities in Lemma l7. II 
describing the effect of Samp are all additive in the following sense: repeated application of Samp 
simply requires addition of the parameters. Moreover, with the chosen parameters, only the 
number of bits consumed depends on the length of the involved bitstrings. The analysis is therefore 
particularly simple, and the effect of the procedure ReSamp is described by the following lemma. 

Lemma 7.2. Let pzE be such that Z is an L-bit string. Let f and L be such that if"^/*)^ > r"^ . Let 
S be independent random bits and let Z' = ReSamp(Z, f. r. S). Then Z' is a L^^/^^ -bit substring 
of Z , with 

Rl+2{Z'\ES)p > K.in{Z\ES)p 5/^ (65) 

where 7 = 5/- 2^^/*. The generation of Z' consumes less than frlogL independent random bits 
from S. 

In particular, if L ~ 2'', then approximately f « log 4/3 -^^S ( 4 log r ) o.ppHcations of the sub- 
protocol Samp are sufficient to produce a substring Z' of Z of length < r^, while preserving the 
min- entropy-rate (for large r). This consumes less than r^ independent random bits. 

Proof. Let Lf^'^ be the length of the string Z*^'); by the definition of the protocol Samp, we have 
L(0 = (l(*-i))3/4 and hence L^f^ = i^^/^)^, as claimed. Let be the random bits from S used 
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to generate Z^'^\ According to Lemma mi we have for all i = 1, . . . , / 

B^'^'iZ^^^lES^'^ ■ ■ ■ > (Z(^-I) • • • 5(^-i))p 



. logr 
'7174 



where 5 = 4- 2^^/^ and e'*^-'^^ is arbitrary. Defining e^"^ = e and e*^*^ = e^'^-'^) + S, this implies 
the claim 

To compute the number of random bits consumed in this procedure]^, observe that in the i-th 
step, a random subset 5'*-' of [n^*)] of size r is chosen, where n*^*' = r(L*^*~^^)^/'*; this consumes 

rrr)ibits. 

The total number of bits we need is bounded by the number / of applications times the maximal 



number of bits max„(i) [log ("^ ) = [log 



< rlogL consumed in a single step. (Here we 



used the fact that ti*-'-' is a decreasing sequence and the bound on the binomial coefficient shown 
in Lemma [73]) This gives the upper bound frlogL, as claimed. □ 



In summary, we have found a procedure that generates a random substring of a string of 2^ 
bits, with the following parameters: 





original Z 


substring Z' 


seed S 


length (bits) 
entropy-rate 


R (arbitrary) 







with error poly{log{r))e~^''^\ It is computable with poly{r) bits of storage. Subsequent to 
this sampling procedure, privacy amplification may be used to extract a secret key from the 
substring Z' . In conclusion, this gives a sample-and-hash procedure for key expansion in the 
bounded storage model, expanding an initial key of bits to approximately r^{R— l/r^^^^^) bits, 
(i? S [0, 1] is usually assumed to be constant.) 



A Additional proofs related to entropies 

Proof of Lemma \6J7\ Let pyzE = J2y PY[y)\v){v\ PzE\Y=y Let 

g:={yey\Hi^^{Z\E,Y^y)>k) . 
For every y £ Q, there is a subnormalised state and a state cr|, such that 

WpIe- PZE\Y=y\\ < S 



by definition. For every y ^ Q, we choose arbitrary states p^B ^^'^ satisfying 

pIe < 2~'^l ■ 

It is easy to verify that the two states 

PYZE = ^PY{y)\y){y\® pIe 



<JYE ^^PY{y)\y){y\®<J 



satisfy 



The claim follows from this. 



Pyze < 2 ^'(Tye 
\pyze ~ PyzeW < 5 + e . 



□ 



^"Note that we are only interested in the approximate number of bits we need; see [DHRS04] for a dense encoding 
of subsets of [n] into bitstrings. 
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We next prove the non-trivial inequality in ()63p . 

Lemma A.l. Let pab be a subnormalised state. We have 

h^t'(^|S)p > Hi,,,iA\B)p - 2 log l/£ . (66) 

Proof. Clearly, it suffices to show that the claim of the lemma is true for (5 = 0. 
By definition, there is a normalised state as such that 

PAB < ^ "^CTb . 

This implies that 

Pb ~ Pb 

Let Pb denote the projector onto the eigenspaces of — corresponding to eigenvalues smaller than 

Pb 

or equal to 1/e'^. Applying this on both sides of the previous inequality gives (with Pb < ids) 

2-H^.AA\B,. 

Pb Pb < 5 ids ■ 

PB £^ 

1/2 

Multiplying from both sides by pg , we obtain 

PAB < 5 PB , (67) 

where we introduced the operator pab = P^b^ Pb^^^^PbP^b^ ■ We claim that 

Pb < Pb , (68) 
tr{pAB) < 1 and (69) 

^Wpab - PabW < £ ■ (70) 

Note that §7^-^ imply the claim §Q (for (5 = 0). 

Inequality jMl) directly follows from the fact that trA{^) = ^ < ids- To prove let 
I^ASc) be a purification of pab and consider the purification 

l^ABc) ^pTPbPb'^^I'^abc) 

of PAB- Using the Schmidt decomposition I'^abc) = 'J2\^^\-^)ac\^)b, it is straightforward to 
verify that 

(^ABcl^ABc) = tr(PBPs) and {"^abcI^abc) < tr{PBPB) < tr(pB) . (71) 
Note that the latter of these inequalities proves ((69|) . The Cauchy-Schwarz inequality J2i=i — 



VdyJ2i=i implies that for any two pure states |x),|(/3), we have 



where ||A||2 — ^tr{A^ A), since the difference |<y5)((y9| — |x)(xl tias rank at most 2. Applying this 
to l^ABc) and |^abc) and using ([71]) gives 



|^'ABc)(*Asc| - \'^abc){^abc\\\ < V2Vtr(ps)2 -tr(PBPB)2 



= \/2v/(tr(pB) - tr{PBPB)) {tr{pB) + tr{PBPB)) 



< 2Jtr{P^pB) . 
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Here Pg = ids — Pb projects onto the orthogonal complement of the image of Pb- In the last 
inequality, we have used the assumption that pab is subnormalised and thus tr{pB) < 1- Since 
the trace distance is non-increasing under partial traces, the claim (|70p follows once we show that 

tr{PhpB) < . 

Note that we have -^iPb ^ ^b^^b definition. Inserting this into the expression of interest 
gives 

tr{P^PB) < shr (^P^^P^pB^ < shr (^^pB^ < sM^b) = , 

as claimed. 

□ 

B Additional lemmas 

Lemma B.l. Let {Q"}ae[m] family of Hermitian operators on a Hilbert space A(>^ B and 
suppose that IrsiQ"' PabQ") < (^A, for any a e [m]. Then 

trBiQpAsQ) < m'^cTA 

for Q :— J2a Q"'- ^"^ particular, if {Q""} a£[m\ resolves the identity on supp(pyiB) then pA < m^OA- 
Proof. Let \(Pa) G A be arbitrary. By definition, we have 

tr {X.rB{QpABQ)WA){fA\) = {QpabQ{Wa){^a\ ® ids)) 

= Y.Xr{Q^PABQ^{WA){^A\®\AB)) ■ (72) 

By the cyclicity of the trace and the fact that ^ ids is a projector, we have 

tr {Q'-PabQ^{\^a){'Pa\ <»\dB)) ^triZ^iZ")^) , 

1 /2 

with = {\LpA){fA \ ^ The operator-Cauchy-Schwarz-inequality 

tr{EF) < ^tr{E'iE)tr{F^F) 
applied to E ^ Z"' and F = (Z^')t therefore gives (with the cyclicity of the trace) 

tr {Q''pabQ^{\va){^a\ ® ids)) < ^tr((Z")tZ")tr((Z^)tZ/3) . (73) 
It is straightforward to verify that 

tr((Z")tZ") ^U{trB{Q"pABQ'')\iPA){iPA\)<XriaA\iPA){iPA\) for all a e [m] , (74) 
where we used the assumption in the last inequality. Combining ([7^ with ([75]) and ([7^ gives 

tr {■trB{QpABQ)\'fA) ifAl) <ir{m'^aA\'fA){ipA\) ■ 
Since \ipA) G A was arbitrary, the claim follows. □ 
Lemma B.2. Let P and P' be two projectors on a Hilbert space Ti.. Then 
(i) IfsuppP C suppP', then P < P' . 
(ii) IfP< P', then PP' = P'P = P. 
Proof. Both statements follow immediately from the fact that 

suppP' — suppP ffi (suppP)^ , 

where (suppP)^ is the orthogonal complement of suppP in suppP'. This identity implies P' = 
P + id(suppp)-L, where id(suppp)-L is the projector onto (suppP)-*^. Thus PP' = P'P — P. □ 
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